Data augmentation under scarce condition for neural machine translation

Dan Luo, Shumin Shi, Rihai Su, Heyan Huang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Neural Machine Translation (NMT) has achieved state-of-the-art performance depending on the availability of copious parallel corpora. However, for low-resource NMT task, the scarcity of training data will inevitably lead to poor translation performance. In order to relieve the dependence on scale of bilingual corpus and to cut down training time, we propose a novel data augmentation method named SMC under scarce condition that can Sample Monolingual Corpus containing difficult words only in back-translation process for Mongolian-Chinese (Mn-Ch) and English-Chinese (En-Ch) NMT. Inspired by work in curriculum learning, our approach takes into account the various difficulty-degree of the sample and the corresponding model capabilities. Experimental results show that our method improves translation quality respectively by up to 2.4 and 1.72 BLEU points over the baselines on En-Ch and Mn-Ch datasets while greatly reducing training time.

Original languageEnglish
Title of host publicationProceedings of 2019 6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019
EditorsXizhao Wang, Weining Wang, Xiangnan He
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages36-40
Number of pages5
ISBN (Electronic)9781728138633
DOIs
Publication statusPublished - Dec 2019
Event6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019 - Singapore, Singapore
Duration: 19 Dec 201921 Dec 2019

Publication series

NameProceedings of 2019 6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019

Conference

Conference6th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2019
Country/TerritorySingapore
CitySingapore
Period19/12/1921/12/19

Keywords

  • Competence-based curriculum learning
  • Data augmentation
  • Low-resource neural machine translation
  • Natural language processing

Fingerprint

Dive into the research topics of 'Data augmentation under scarce condition for neural machine translation'. Together they form a unique fingerprint.

Cite this