Prosodic annotation enriched statistical machine translation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

More and more linguistic information has been employed to improve the performance of machine translation, such as part of speech, syntactic structures, discourse contexts, and so on. However, conventional approaches typically ignore the key information beyond the text such as prosody. In this paper, we exploit and employ three prosodic features: pronunciation (phonetic alphabet and tone), prosodic boundaries and emphasis. Based on the annotated data, a conditional random fields (CRF) sequential tagger is used to label the prosodic tags for Chinese sentences, and three methods are presented to integrate these features: (1) factored translation models where the prosodic features are incorporated as factors; (2) a word lattice decoding model where the prosodic boundaries are considered to be an alternative to the tokenization boundaries; (3) re-ranking models where the prosodic features are integrated in the language model to re-score the n-best translation candidates. We evaluate the proposed methods with bilingual evaluation understudy (BLEU) score both in English-to-Chinese (E2C) and Chinese-to-English (C2E) translation directions. Experiments show that with prosodic features, the re-ranking model achieves significant improvement, while the word lattice decoding and the factored translation models also improve the performance.

Original languageEnglish
Title of host publicationProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
EditorsHsin-Min Wang, Qingzhi Hou, Yuan Wei, Tan Lee, Jianguo Wei, Lei Xie, Hui Feng, Jianwu Dang, Jianwu Dang
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509042937
DOIs
Publication statusPublished - 2 May 2017
Event10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016 - Tianjin, China
Duration: 17 Oct 201620 Oct 2016

Publication series

NameProceedings of 2016 10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016

Conference

Conference10th International Symposium on Chinese Spoken Language Processing, ISCSLP 2016
Country/TerritoryChina
CityTianjin
Period17/10/1620/10/16

Keywords

  • Factored model
  • Machine translation
  • Prosody
  • Re-ranking
  • Word lattice

Fingerprint

Dive into the research topics of 'Prosodic annotation enriched statistical machine translation'. Together they form a unique fingerprint.

Cite this