TY - GEN
T1 - Construction of Uighur-Chinese parallel corpus
AU - Song, J. L.
AU - Dai, L.
N1 - Publisher Copyright:
© 2015 Taylor & Francis Group, London.
PY - 2015
Y1 - 2015
N2 - Uighur-Chinese parallel corpus is an important foundation of Uighur-Chinese cross-language information processing. As a corpus of minority language, its construction is relatively more difficult. In this paper, we discuss issues related to the construction. We firstly introduce the selection of corpus resources. Second, in order to accelerate the construction and improve the quality of the corpus, we develop an assistant construction system based on webpage content extraction and text duplication removal, etc. By using this system, we build a Uighur-Chinese parallel corpus with approximately 300,000 sentence pairs and a moderate size of dictionary of person name and place name. Finally, to evaluate the corpus, we build a demo Uighur-Chinese statistical translation system to explore the corpus. The result preliminarily verifies its effectiveness.
AB - Uighur-Chinese parallel corpus is an important foundation of Uighur-Chinese cross-language information processing. As a corpus of minority language, its construction is relatively more difficult. In this paper, we discuss issues related to the construction. We firstly introduce the selection of corpus resources. Second, in order to accelerate the construction and improve the quality of the corpus, we develop an assistant construction system based on webpage content extraction and text duplication removal, etc. By using this system, we build a Uighur-Chinese parallel corpus with approximately 300,000 sentence pairs and a moderate size of dictionary of person name and place name. Finally, to evaluate the corpus, we build a demo Uighur-Chinese statistical translation system to explore the corpus. The result preliminarily verifies its effectiveness.
UR - http://www.scopus.com/inward/record.url?scp=84949808642&partnerID=8YFLogxK
U2 - 10.1201/b18512-73
DO - 10.1201/b18512-73
M3 - Conference contribution
AN - SCOPUS:84949808642
SN - 9781138027756
T3 - Multimedia, Communication and Computing Application - Proceedings of the International Conference on Multimedia, Communication and Computing Application, MCCA 2014
SP - 353
EP - 356
BT - Multimedia, Communication and Computing Application - Proceedings of the International Conference on Multimedia, Communication and Computing Application, MCCA 2014
A2 - Leung, Ally
PB - CRC Press/Balkema
T2 - International Conference on Multimedia, Communication and Computing Application, MCCA 2014
Y2 - 15 October 2014 through 16 October 2014
ER -