Utilizing crowdsourcing for the construction of chinese-mongolian speech corpus with evaluation mechanism

Rihai Su, Shumin Shi*, Meng Zhao, Heyan Huang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

Crowdsourcing has been used recently as an alternative to traditional costly annotation by many natural language processing groups. In this paper, we explore the use of Wechat Official Account Platform (WOAP) in order to build a speech corpus and to assess the feasibility of using WOAP followers (also known as contributors) to assemble speech corpus of Mongolian. A Mongolian language qualification test was used to filter out potential non-qualified participants. We gathered natural speech recordings in our daily life, and constructed a Chinese-Mongolian Speech Corpus (CMSC) of 31472 utterances from 296 native speakers who are fluent in Mongolian, totalling 30.8 h of speech. Then, an evaluation experiment was performed, in where the contributors were asked to choose a correct sentence from a multiple choice list to ensure the high-quality of corpus. The results obtained so far showed that crowdsourcing for constructing CMSC with an evaluation mechanism could be more effective than traditional experiments requiring expertise.

源语言英语
主期刊名Data Science - 3rd International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2017, Proceedings
编辑Qilong Han, Beiji Zou, Xiaoning Peng, Zeguang Lu, Guanglu Sun, Weipeng Jing
出版商Springer Verlag
55-65
页数11
ISBN(印刷版)9789811063879
DOI
出版状态已出版 - 2017
活动3rd International Conference of Pioneer Computer Scientists, Engineers, and Educators, ICPCSEE 2017 - Changsha, 中国
期限: 22 9月 201724 9月 2017

出版系列

姓名Communications in Computer and Information Science
728
ISSN(印刷版)1865-0929

会议

会议3rd International Conference of Pioneer Computer Scientists, Engineers, and Educators, ICPCSEE 2017
国家/地区中国
Changsha
时期22/09/1724/09/17

指纹

探究 'Utilizing crowdsourcing for the construction of chinese-mongolian speech corpus with evaluation mechanism' 的科研主题。它们共同构成独一无二的指纹。

引用此