Pretreatment for speech machine translation

Xiaofei Zhang*, Chong Feng, Heyan Huang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

There are many meaningless modal particles and dittographes in natural spoken language, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. And thus the translation would be rather poor if the ASR results are directly translated by MT (machine translation). Therefore, it is necessary to transform the abnormal ASR results into normative texts to fit machine translation. In this paper, a pretreatment approach which based on conditional random field model was introduced to delete the meaningless modal particles and dittographes, correct the recognition errors, and punctuated the ASR results before machine translation. Experiments show that the MT BLEU of 0.2497 is obtained, that improved by 18.4% over the MT baseline without pretreatment.

Original languageEnglish
Title of host publicationComputational Collective Intelligence
Subtitle of host publicationTechnologies and Applications - Second International Conference, ICCCI 2010, Proceedings
Pages113-121
Number of pages9
EditionPART 2
DOIs
Publication statusPublished - 2010
Event2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010 - Kaohsiung, Taiwan, Province of China
Duration: 10 Nov 201012 Nov 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume6422 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Computational Collective Intelligence - Technologies and Applications, ICCCI 2010
Country/TerritoryTaiwan, Province of China
CityKaohsiung
Period10/11/1012/11/10

Keywords

  • Automatic speech recognition
  • Conditional random field model
  • Pretreatment
  • Speech machine translation

Fingerprint

Dive into the research topics of 'Pretreatment for speech machine translation'. Together they form a unique fingerprint.

Cite this