TY - GEN
T1 - Pretreatment for speech machine translation
AU - Zhang, Xiaofei
AU - Feng, Chong
AU - Huang, Heyan
PY - 2010
Y1 - 2010
N2 - There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.
AB - There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.
KW - Automatic speech recognition
KW - Conditional random field model
KW - Pretreatment
KW - Speech machine translation
UR - http://www.scopus.com/inward/record.url?scp=78649621878&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-16732-4_13
DO - 10.1007/978-3-642-16732-4_13
M3 - Conference contribution
AN - SCOPUS:78649621878
SN - 3642167314
SN - 9783642167317
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 113
EP - 121
BT - Computational Collective Intelligence
T2 - 2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010
Y2 - 10 November 2010 through 12 November 2010
ER -