Revisiting Back-Translation for Low-Resource Machine Translation between Chinese and Vietnamese

Hongzheng Li, Jiu Sha, Can Shi*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Back-translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be beneficial for improving the performance of translation effectively, especially for low-resource scenarios. While most previous works related to BT mainly focus on European languages with high relatedness, few of them study less-related languages in other areas around the world. In this paper, we choose the language pair with less relatedness in Asia: Chinese and Vietnamese, to investigate the impacts of BT on extremely low-resource machine translation between them. We first discuss the similarities and differences between the two languages, then evaluate and compare the effects of different sizes of back-translated data on NMT and Statistical Machine Translation (SMT) models for Chinese-Vietnamese and Vietnamese-Chinese, with both character-based and word-based settings, and conduct further analysis on the translation outputs from several aspects. Some conclusions from previous works are partially confirmed and we also draw some new findings and conclusions, which are beneficial to understand BT further and deeper for translation between less-related low-resource languages.

Original languageEnglish
Article number9129718
Pages (from-to)119931-119939
Number of pages9
JournalIEEE Access
Volume8
DOIs
Publication statusPublished - 2020

Keywords

  • Back-translation
  • Chinese
  • Vietnamese
  • low-resource languages
  • machine translation

Fingerprint

Dive into the research topics of 'Revisiting Back-Translation for Low-Resource Machine Translation between Chinese and Vietnamese'. Together they form a unique fingerprint.

Cite this