Enhanced encoder for non-autoregressive machine translation

Shuheng Wang; Shumin Shi; Heyan Huang

doi:10.1007/s10590-021-09285-x

Enhanced encoder for non-autoregressive machine translation

Shuheng Wang, Shumin Shi^*, Heyan Huang

^*此作品的通讯作者

计算机学院

Nanjing University of Science and Technology

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Non-autoregressive machine translation aims to speed up the decoding procedure by discarding the autoregressive model and generating the target words independently. Because non-autoregressive machine translation fails to exploit target-side information, the ability to accurately model source representations is critical. In this paper, we propose an approach to enhance the encoder’s modeling ability by using a pre-trained BERT model as an extra encoder. With a different tokenization method, the BERT encoder and the Raw encoder can model the source input from different aspects. Furthermore, having a gate mechanism, the decoder can dynamically determine which representations contribute to the decoding process. Experimental results on three translation tasks show that our method can significantly improve the performance of non-autoregressive MT, and surpass the baseline non-autoregressive models. On the WMT14 EN→DE translation task, our method achieves 27.87 BLEU with a single decoding step. This is a comparable result with the baseline autoregressive Transformer model which obtains a score of 27.8 BLEU.

源语言	英语
页（从-至）	595-609
页数	15
期刊	Machine Translation
卷	35
期	4
DOI	https://doi.org/10.1007/s10590-021-09285-x
出版状态	已出版 - 12月 2021

访问文件

10.1007/s10590-021-09285-x

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, S., Shi, S., & Huang, H. (2021). Enhanced encoder for non-autoregressive machine translation. Machine Translation, 35(4), 595-609. https://doi.org/10.1007/s10590-021-09285-x

@article{1db8076afcc3454b8e50ab8da52837f0,

title = "Enhanced encoder for non-autoregressive machine translation",

abstract = "Non-autoregressive machine translation aims to speed up the decoding procedure by discarding the autoregressive model and generating the target words independently. Because non-autoregressive machine translation fails to exploit target-side information, the ability to accurately model source representations is critical. In this paper, we propose an approach to enhance the encoder{\textquoteright}s modeling ability by using a pre-trained BERT model as an extra encoder. With a different tokenization method, the BERT encoder and the Raw encoder can model the source input from different aspects. Furthermore, having a gate mechanism, the decoder can dynamically determine which representations contribute to the decoding process. Experimental results on three translation tasks show that our method can significantly improve the performance of non-autoregressive MT, and surpass the baseline non-autoregressive models. On the WMT14 EN→DE translation task, our method achieves 27.87 BLEU with a single decoding step. This is a comparable result with the baseline autoregressive Transformer model which obtains a score of 27.8 BLEU.",

keywords = "Machine translation, Non-autoregressive, Pre-training language model",

author = "Shuheng Wang and Shumin Shi and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Nature B.V. 2021.",

year = "2021",

month = dec,

doi = "10.1007/s10590-021-09285-x",

language = "English",

volume = "35",

pages = "595--609",

journal = "Machine Translation",

issn = "0922-6567",

publisher = "Springer Science and Business Media B.V.",

number = "4",

}

TY - JOUR

T1 - Enhanced encoder for non-autoregressive machine translation

AU - Wang, Shuheng

AU - Shi, Shumin

AU - Huang, Heyan

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Nature B.V. 2021.

PY - 2021/12

Y1 - 2021/12

N2 - Non-autoregressive machine translation aims to speed up the decoding procedure by discarding the autoregressive model and generating the target words independently. Because non-autoregressive machine translation fails to exploit target-side information, the ability to accurately model source representations is critical. In this paper, we propose an approach to enhance the encoder’s modeling ability by using a pre-trained BERT model as an extra encoder. With a different tokenization method, the BERT encoder and the Raw encoder can model the source input from different aspects. Furthermore, having a gate mechanism, the decoder can dynamically determine which representations contribute to the decoding process. Experimental results on three translation tasks show that our method can significantly improve the performance of non-autoregressive MT, and surpass the baseline non-autoregressive models. On the WMT14 EN→DE translation task, our method achieves 27.87 BLEU with a single decoding step. This is a comparable result with the baseline autoregressive Transformer model which obtains a score of 27.8 BLEU.

AB - Non-autoregressive machine translation aims to speed up the decoding procedure by discarding the autoregressive model and generating the target words independently. Because non-autoregressive machine translation fails to exploit target-side information, the ability to accurately model source representations is critical. In this paper, we propose an approach to enhance the encoder’s modeling ability by using a pre-trained BERT model as an extra encoder. With a different tokenization method, the BERT encoder and the Raw encoder can model the source input from different aspects. Furthermore, having a gate mechanism, the decoder can dynamically determine which representations contribute to the decoding process. Experimental results on three translation tasks show that our method can significantly improve the performance of non-autoregressive MT, and surpass the baseline non-autoregressive models. On the WMT14 EN→DE translation task, our method achieves 27.87 BLEU with a single decoding step. This is a comparable result with the baseline autoregressive Transformer model which obtains a score of 27.8 BLEU.

KW - Machine translation

KW - Non-autoregressive

KW - Pre-training language model

UR - http://www.scopus.com/inward/record.url?scp=85119142767&partnerID=8YFLogxK

U2 - 10.1007/s10590-021-09285-x

DO - 10.1007/s10590-021-09285-x

M3 - Article

AN - SCOPUS:85119142767

SN - 0922-6567

VL - 35

SP - 595

EP - 609

JO - Machine Translation

JF - Machine Translation

IS - 4

ER -

Enhanced encoder for non-autoregressive machine translation

摘要

访问文件

其它文件与链接

指纹

引用此