Better Localness for Non-Autoregressive Transformer

Shuheng Wang, Heyan Huang, Shumin Shi*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.

Original languageEnglish
Article number125
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume22
Issue number5
DOIs
Publication statusPublished - 8 May 2023

Keywords

  • Non-autoregressive
  • attention module
  • localness
  • translation

Fingerprint

Dive into the research topics of 'Better Localness for Non-Autoregressive Transformer'. Together they form a unique fingerprint.

Cite this