TY - JOUR
T1 - Improving non-autoregressive machine translation via autoregressive training
AU - Wang, Shuheng
AU - Shi, Shumin
AU - Huang, Heyan
AU - Zhang, Wei
N1 - Publisher Copyright:
© 2021 Institute of Physics Publishing. All rights reserved.
PY - 2021/9/30
Y1 - 2021/9/30
N2 - In recent years, non-autoregressive machine translation has attracted many researchers’ attentions. Non-autoregressive translation (NAT) achieves faster decoding speed at the cost of translation accuracy compared with autoregressive translation (AT). Since NAT and AT models have similar architecture, a natural idea is to use AT task assisting NAT task. Previous works use curriculum learning or distillation to improve the performance of NAT model. However, they are complex to follow and diffucult to be integrated into some new works. So in this paper, to make it easy, we introduce a multi-task framework to improve the performance of NAT task. Specially, we use a fully shared encoder-decoder network to train NAT task and AT task simultaneously. To evaluate the performance of our model, we conduct experiments on serval benchmask tasks, including WMT14 EN-DE, WMT16 EN-RO and IWSLT14 DE-EN. The experimental results demonstrate that our model achieves improvements but still keeps simple.
AB - In recent years, non-autoregressive machine translation has attracted many researchers’ attentions. Non-autoregressive translation (NAT) achieves faster decoding speed at the cost of translation accuracy compared with autoregressive translation (AT). Since NAT and AT models have similar architecture, a natural idea is to use AT task assisting NAT task. Previous works use curriculum learning or distillation to improve the performance of NAT model. However, they are complex to follow and diffucult to be integrated into some new works. So in this paper, to make it easy, we introduce a multi-task framework to improve the performance of NAT task. Specially, we use a fully shared encoder-decoder network to train NAT task and AT task simultaneously. To evaluate the performance of our model, we conduct experiments on serval benchmask tasks, including WMT14 EN-DE, WMT16 EN-RO and IWSLT14 DE-EN. The experimental results demonstrate that our model achieves improvements but still keeps simple.
UR - http://www.scopus.com/inward/record.url?scp=85117591080&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/2031/1/012045
DO - 10.1088/1742-6596/2031/1/012045
M3 - Conference article
AN - SCOPUS:85117591080
SN - 1742-6588
VL - 2031
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012045
T2 - 2021 2nd International Conference on Signal Processing and Computer Science, SPCS 2021
Y2 - 20 August 2021 through 22 August 2021
ER -