Addressing Syntactic Divergence in Low-Resource Neural Machine Translation via Language Independent Word Reordering

Jiangcan Yixi, Chao Su, Shumin Shi*, Xiaobing Zhao*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Neural machine translation using the combination of parallel and synthetic corpus has achieved impressive translation performance for several language pairs, where the synthetic corpus is typically generated by back-translating the monolingual target sentences. However, the quality of the synthetic corpus is poor in low-resource scenarios, which reduces the contribution of data augmentation methods such as back translation to the translation quality, especially for syntactically distant language pairs. In this paper, we propose a novel solution which uses a language independent word reordering method to address syntactic divergences between the target and source languages. The method indirectly converts the word order of the target language to the source language using an assisting language that has a similar word order to the source language and has sufficient sentence pairs with the target language. A higher quality synthetic corpus can be obtained by translating source-ordered monolingual target sentences using a bilingual dictionary. The synthetic corpus and the parallel corpus are merged to train a more powerful NMT model. Experiments on real low-resource Tibetan-Chinese, Uyghur-Chinese and Mongolian-Chinese show that our method achieves significant improvements over other semi-supervised methods. Our word reordering method avoids problems such as insufficient reordering training data and immature lexical analysers.

Original languageEnglish
Title of host publicationIntelligent Multilingual Information Processing - 1st International Conference, IMLIP 2024, Proceedings
EditorsHuaping Zhang, Jianyun Shang, Jinsong Su
PublisherSpringer Science and Business Media Deutschland GmbH
Pages103-124
Number of pages22
ISBN (Print)9789819651221
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event1st International Conference on Intelligent Multilingual Information Processing, IMLIP 2024 - Beijing, China
Duration: 16 Nov 202417 Nov 2024

Publication series

NameCommunications in Computer and Information Science
Volume2395 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference1st International Conference on Intelligent Multilingual Information Processing, IMLIP 2024
Country/TerritoryChina
CityBeijing
Period16/11/2417/11/24

Keywords

  • Language Independent
  • Low-Resource Neural Machine Translation
  • Syntactic Divergence
  • Word Reordering

Fingerprint

Dive into the research topics of 'Addressing Syntactic Divergence in Low-Resource Neural Machine Translation via Language Independent Word Reordering'. Together they form a unique fingerprint.

Cite this