TY - GEN
T1 - Improving Non-autoregressive Machine Translation with Soft-Masking
AU - Wang, Shuheng
AU - Shi, Shumin
AU - Huang, Heyan
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN → DE, WMT16 EN → RO, and IWSLT14 DE → EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.
AB - In recent years, non-autoregressive machine translation has achieved great success due to its promising inference speedup. Non-autoregressive machine translation reduces the decoding latency by generating the target words in single-pass. However, there is a considerable gap in the accuracy between non-autoregressive machine translation and autoregressive machine translation. Because it removes the dependencies between the target words, non-autoregressive machine translation tends to generate repetitive words or wrong words, and these repetitive or wrong words lead to low performance. In this paper, we introduce a soft-masking method to alleviate this issue. Specifically, we introduce an autoregressive discriminator, which will output the probabilities hinting which embeddings are correct. Then according to the probabilities, we add mask on the copied representations, which enables the model to consider which words are easy to be predicted. We evaluated our method on three benchmarks, including WMT14 EN → DE, WMT16 EN → RO, and IWSLT14 DE → EN. The experimental results demonstrate that our method can outperform the baseline by a large margin with a bit of speed sacrifice.
KW - Machine translation
KW - Non-autoregressive
KW - Soft-masking
UR - http://www.scopus.com/inward/record.url?scp=85118159991&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-88480-2_12
DO - 10.1007/978-3-030-88480-2_12
M3 - Conference contribution
AN - SCOPUS:85118159991
SN - 9783030884796
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 141
EP - 152
BT - Natural Language Processing and Chinese Computing - 10th CCF International Conference, NLPCC 2021, Proceedings
A2 - Wang, Lu
A2 - Feng, Yansong
A2 - Hong, Yu
A2 - He, Ruifang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 10th CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2021
Y2 - 13 October 2021 through 17 October 2021
ER -