Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

Xuewen Shi; Heyan Huang; Ping Jian; Yi Kun Tang

doi:10.1145/3583684

Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

Xuewen Shi, Heyan Huang, Ping Jian^*, Yi Kun Tang

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur-Chinese, Chinese-English, English-German, and English-French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT'20 submissions. Experimental results show that the denoised QE results gain better Pearson's correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.

源语言	英语
文章编号	126
期刊	ACM Transactions on Asian and Low-Resource Language Information Processing
卷	22
期	5
DOI	https://doi.org/10.1145/3583684
出版状态	已出版 - 9 5月 2023

访问文件

10.1145/3583684

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{f46f154f88224f8da819860c693fddec,

title = "Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods",

abstract = "It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur-Chinese, Chinese-English, English-German, and English-French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT'20 submissions. Experimental results show that the denoised QE results gain better Pearson's correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.",

keywords = "Neural machine translation, causal inference, half-sibling regression, quality estimation",

author = "Xuewen Shi and Heyan Huang and Ping Jian and Tang, {Yi Kun}",

note = "Publisher Copyright: {\textcopyright} 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.",

year = "2023",

month = may,

day = "9",

doi = "10.1145/3583684",

language = "English",

volume = "22",

journal = "ACM Transactions on Asian and Low-Resource Language Information Processing",

issn = "2375-4699",

publisher = "Association for Computing Machinery (ACM)",

number = "5",

}

TY - JOUR

T1 - Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

AU - Shi, Xuewen

AU - Huang, Heyan

AU - Jian, Ping

AU - Tang, Yi Kun

PY - 2023/5/9

Y1 - 2023/5/9

N2 - It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur-Chinese, Chinese-English, English-German, and English-French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT'20 submissions. Experimental results show that the denoised QE results gain better Pearson's correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.

AB - It is hard to evaluate translations objectively and accurately, which limits the applications of machine translation. In this article, we assume that the above phenomenon is caused by noise interference during translation evaluation, and we handle the problem through a perspective of causal inference. We assume that the observable translation score is affected by the unobservable true translation quality and some noise simultaneously. If there is a variable that is related to the noise and independent to the true translation quality, the related noise can be eliminated by removing the effect of that variable from the observed score. Based on the above causality hypothesis, this article studies the length bias problem of beam search for neural machine translation (NMT) and the input related noise problem of translation quality estimation (QE). For the NMT length bias problem, we conduct the experiments on four typical NMT tasks (Uyghur-Chinese, Chinese-English, English-German, and English-French) with different scales of datasets. Comparing with previous approaches, the proposed causal motivated method is model-agnostic and does not require supervised training. For QE tasks, we conduct the experiments on the WMT'20 submissions. Experimental results show that the denoised QE results gain better Pearson's correlation scores with human assessed scores compared to the original submissions. Further analyses on the NMT and QE tasks also demonstrate the rationality of the empirical assumptions made on our methods.

KW - Neural machine translation

KW - causal inference

KW - half-sibling regression

KW - quality estimation

UR - http://www.scopus.com/inward/record.url?scp=85162183220&partnerID=8YFLogxK

U2 - 10.1145/3583684

DO - 10.1145/3583684

M3 - Article

AN - SCOPUS:85162183220

SN - 2375-4699

VL - 22

JO - ACM Transactions on Asian and Low-Resource Language Information Processing

JF - ACM Transactions on Asian and Low-Resource Language Information Processing

IS - 5

M1 - 126

ER -

Approximating to the Real Translation Quality for Neural Machine Translation via Causal Motivated Methods

摘要

访问文件

其它文件与链接

指纹

引用此