Data augmentation under scarce condition for neural machine translation

Dan Luo, Shumin Shi, Rihai Su, Heyan Huang

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Neural Machine Translation (NMT) has achieved state-of-the-art performance depending on the availability of copious parallel corpora. However, for low-resource NMT task, the scarcity of training data will inevitably lead to poor translation performance. In order to relieve the dependence on scale of bilingual corpus and to cut down training time, we propose a novel data augmentation method named SMC under scarce condition that can Sample Monolingual Corpus containing difficult words only in back-translation process for Mongolian-Chinese (Mn-Ch) and English-Chinese (En-Ch) NMT. Inspired by work in curriculum learning, our approach takes into account the various difficulty-degree of the sample and the corresponding model capabilities. Experimental results show that our method improves translation quality respectively by up to 2.4 and 1.72 BLEU points over the baselines on En-Ch and Mn-Ch datasets while greatly reducing training time.

源语言英语
主期刊名Proceedings of 2019 6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019
编辑Xizhao Wang, Weining Wang, Xiangnan He
出版商Institute of Electrical and Electronics Engineers Inc.
36-40
页数5
ISBN(电子版)9781728138633
DOI
出版状态已出版 - 12月 2019
活动6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019 - Singapore, 新加坡
期限: 19 12月 201921 12月 2019

出版系列

姓名Proceedings of 2019 6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019

会议

会议6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019
国家/地区新加坡
Singapore
时期19/12/1921/12/19

指纹

探究 'Data augmentation under scarce condition for neural machine translation' 的科研主题。它们共同构成独一无二的指纹。

引用此