TY - GEN
T1 - ASR normalization for machine translation
AU - Huang, Heyan
AU - Feng, Chong
AU - Wang, Jiande
AU - Zhang, Xiaofei
PY - 2010
Y1 - 2010
N2 - In natural spoken language there are many meaningless modal particles and dittographes, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. Therefore, the translation would be rather poor if the ASR results are directly translated by MT (machine translation). In this paper, an ASR normalization approach was introduced for machine translation which based on maximum entropy sequential labeling model. Before translation, the meaningless modal particles and dittograph were deleted, and the recognition errors were corrected, and ASR results were also punctuated. Experiments show that the MT BLEU of 0.2465 is obtained, that improved by 17.3% over the MT baseline without normalization. The positive experimental results confirm that ASR normalization is effective for improvement of translation quality for spoken language machine translation.
AB - In natural spoken language there are many meaningless modal particles and dittographes, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. Therefore, the translation would be rather poor if the ASR results are directly translated by MT (machine translation). In this paper, an ASR normalization approach was introduced for machine translation which based on maximum entropy sequential labeling model. Before translation, the meaningless modal particles and dittograph were deleted, and the recognition errors were corrected, and ASR results were also punctuated. Experiments show that the MT BLEU of 0.2465 is obtained, that improved by 17.3% over the MT baseline without normalization. The positive experimental results confirm that ASR normalization is effective for improvement of translation quality for spoken language machine translation.
KW - Automatic speech recognition
KW - Machine translation
KW - Maximum entropy model
KW - Normalization
KW - Spoken language
UR - http://www.scopus.com/inward/record.url?scp=78449307023&partnerID=8YFLogxK
U2 - 10.1109/IHMSC.2010.122
DO - 10.1109/IHMSC.2010.122
M3 - Conference contribution
AN - SCOPUS:78449307023
SN - 9780769541518
T3 - Proceedings - 2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010
SP - 91
EP - 94
BT - Proceedings - 2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010
T2 - 2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010
Y2 - 26 August 2010 through 28 August 2010
ER -