Revisiting Back-Translation for Low-Resource Machine Translation between Chinese and Vietnamese

Hongzheng Li, Jiu Sha, Can Shi*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

13 引用 (Scopus)

摘要

Back-translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be beneficial for improving the performance of translation effectively, especially for low-resource scenarios. While most previous works related to BT mainly focus on European languages with high relatedness, few of them study less-related languages in other areas around the world. In this paper, we choose the language pair with less relatedness in Asia: Chinese and Vietnamese, to investigate the impacts of BT on extremely low-resource machine translation between them. We first discuss the similarities and differences between the two languages, then evaluate and compare the effects of different sizes of back-translated data on NMT and Statistical Machine Translation (SMT) models for Chinese-Vietnamese and Vietnamese-Chinese, with both character-based and word-based settings, and conduct further analysis on the translation outputs from several aspects. Some conclusions from previous works are partially confirmed and we also draw some new findings and conclusions, which are beneficial to understand BT further and deeper for translation between less-related low-resource languages.

源语言英语
文章编号9129718
页(从-至)119931-119939
页数9
期刊IEEE Access
8
DOI
出版状态已出版 - 2020

指纹

探究 'Revisiting Back-Translation for Low-Resource Machine Translation between Chinese and Vietnamese' 的科研主题。它们共同构成独一无二的指纹。

引用此