Case-Sensitive Neural Machine Translation

Xuewen Shi, Heyan Huang, Ping Jian*, Yi Kun Tang

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

1 引用 (Scopus)

摘要

Even as an important lexical information for Latin languages, word case is often ignored in machine translation. According to observations, the translation performance drops significantly when we introduce case-sensitive evaluation metrics. In this paper, we introduce two types of case-sensitive neural machine translation (NMT) approaches to alleviate the above problems: i) adding case tokens into the decoding sequence, and ii) adopting case prediction to the conventional NMT. Our proposed approaches incorporate case information to the NMT decoder by jointly learning target word generation and word case prediction. We compare our approaches with multiple kinds of baselines including NMT with naive case-restoration methods and analyze the impacts of various setups on our approaches. Experimental results on three typical translation tasks (Zh-En, En-Fr, En-De) show that our proposed methods lead to the improvements up to 2.5, 1.0 and 0.5 in case-sensitive BLEU scores respectively. Further analyses also illustrate the inherent reasons why our approaches lead to different improvements on different translation tasks.

源语言英语
主期刊名Advances in Knowledge Discovery and Data Mining - 24th Pacific-Asia Conference, PAKDD 2020, Proceedings
编辑Hady W. Lauw, Ee-Peng Lim, Raymond Chi-Wing Wong, Alexandros Ntoulas, See-Kiong Ng, Sinno Jialin Pan
出版商Springer
662-674
页数13
ISBN(印刷版)9783030474256
DOI
出版状态已出版 - 2020
活动24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020 - Singapore, 新加坡
期限: 11 5月 202014 5月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12084 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议24th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2020
国家/地区新加坡
Singapore
时期11/05/2014/05/20

指纹

探究 'Case-Sensitive Neural Machine Translation' 的科研主题。它们共同构成独一无二的指纹。

引用此