ASR normalization for machine translation

Heyan Huang*, Chong Feng, Jiande Wang, Xiaofei Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In natural spoken language there are many meaningless modal particles and dittographes, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. Therefore, the translation would be rather poor if the ASR results are directly translated by MT (machine translation). In this paper, an ASR normalization approach was introduced for machine translation which based on maximum entropy sequential labeling model. Before translation, the meaningless modal particles and dittograph were deleted, and the recognition errors were corrected, and ASR results were also punctuated. Experiments show that the MT BLEU of 0.2465 is obtained, that improved by 17.3% over the MT baseline without normalization. The positive experimental results confirm that ASR normalization is effective for improvement of translation quality for spoken language machine translation.

Original languageEnglish
Title of host publicationProceedings - 2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010
Pages91-94
Number of pages4
DOIs
Publication statusPublished - 2010
Event2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010 - Nanjing, China
Duration: 26 Aug 201028 Aug 2010

Publication series

NameProceedings - 2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010
Volume2

Conference

Conference2010 2nd International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2010
Country/TerritoryChina
CityNanjing
Period26/08/1028/08/10

Keywords

  • Automatic speech recognition
  • Machine translation
  • Maximum entropy model
  • Normalization
  • Spoken language

Fingerprint

Dive into the research topics of 'ASR normalization for machine translation'. Together they form a unique fingerprint.

Cite this