Improving Non-autoregressive Machine Translation with Soft-Masking

Shuheng Wang; Shumin Shi; Heyan Huang

doi:10.1007/978-3-030-88480-2_12

Improving Non-autoregressive Machine Translation with Soft-Masking

Shuheng Wang, Shumin Shi^*, Heyan Huang

^*Corresponding author for this work

School of Computer Science and Technology

Nanjing University of Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN → DE, WMT16 EN → RO, and IWSLT14 DE → EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.

Original language	English
Title of host publication	Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings
Editors	Lu Wang, Yansong Feng, Yu Hong, Ruifang He
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	141-152
Number of pages	12
ISBN (Print)	9783030884796
DOIs	https://doi.org/10.1007/978-3-030-88480-2_12
Publication status	Published - 2021
Event	10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021 - Qingdao, China Duration: 13 Oct 2021 → 17 Oct 2021

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13028 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021
Country/Territory	China
City	Qingdao
Period	13/10/21 → 17/10/21

Keywords

Machine translation
Non-autoregressive
Soft-masking

Access to Document

10.1007/978-3-030-88480-2_12

Cite this

Wang, S., Shi, S., & Huang, H. (2021). Improving Non-autoregressive Machine Translation with Soft-Masking. In L. Wang, Y. Feng, Y. Hong, & R. He (Eds.), Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings (pp. 141-152). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13028 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-88480-2_12

Wang, Shuheng ; Shi, Shumin ; Huang, Heyan. / Improving Non-autoregressive Machine Translation with Soft-Masking. Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings. editor / Lu Wang ; Yansong Feng ; Yu Hong ; Ruifang He. Springer Science and Business Media Deutschland GmbH, 2021. pp. 141-152 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{27c9f44525024fd694a05ea417fb6682,

title = "Improving Non-autoregressive Machine Translation with Soft-Masking",

abstract = "In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN → DE, WMT16 EN → RO, and IWSLT14 DE → EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.",

keywords = "Machine translation, Non-autoregressive, Soft-masking",

author = "Shuheng Wang and Shumin Shi and Heyan Huang",

note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.; 10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021 ; Conference date: 13-10-2021 Through 17-10-2021",

year = "2021",

doi = "10.1007/978-3-030-88480-2_12",

language = "English",

isbn = "9783030884796",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "141--152",

editor = "Lu Wang and Yansong Feng and Yu Hong and Ruifang He",

booktitle = "Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings",

address = "Germany",

}

Wang, S, Shi, S & Huang, H 2021, Improving Non-autoregressive Machine Translation with Soft-Masking. in L Wang, Y Feng, Y Hong & R He (eds), Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13028 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 141-152, 10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021, Qingdao, China, 13/10/21. https://doi.org/10.1007/978-3-030-88480-2_12

Improving Non-autoregressive Machine Translation with Soft-Masking. / Wang, Shuheng; Shi, Shumin ; Huang, Heyan.
Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings. ed. / Lu Wang; Yansong Feng; Yu Hong; Ruifang He. Springer Science and Business Media Deutschland GmbH, 2021. p. 141-152 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13028 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Improving Non-autoregressive Machine Translation with Soft-Masking

AU - Wang, Shuheng

AU - Shi, Shumin

AU - Huang, Heyan

PY - 2021

Y1 - 2021

N2 - In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN → DE, WMT16 EN → RO, and IWSLT14 DE → EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.

AB - In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN → DE, WMT16 EN → RO, and IWSLT14 DE → EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.

KW - Machine translation

KW - Non-autoregressive

KW - Soft-masking

UR - http://www.scopus.com/inward/record.url?scp=85118159991&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-88480-2_12

DO - 10.1007/978-3-030-88480-2_12

M3 - Conference contribution

AN - SCOPUS:85118159991

SN - 9783030884796

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 141

EP - 152

BT - Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings

A2 - Wang, Lu

A2 - Feng, Yansong

A2 - Hong, Yu

A2 - He, Ruifang

PB - Springer Science and Business Media Deutschland GmbH

T2 - 10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021

Y2 - 13 October 2021 through 17 October 2021

ER -

Wang S, Shi S , Huang H. Improving Non-autoregressive Machine Translation with Soft-Masking. In Wang L, Feng Y, Hong Y, He R, editors, Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. p. 141-152. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-88480-2_12

Improving Non-autoregressive Machine Translation with Soft-Masking

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this