Better Localness for Non-Autoregressive Transformer

Shuheng Wang, Heyan Huang, Shumin Shi*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 1
  • Captures
    • Readers: 2
see details

摘要

The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.

源语言英语
文章编号125
期刊ACM Transactions on Asian and Low-Resource Language Information Processing
22
5
DOI
出版状态已出版 - 8 5月 2023

指纹

探究 'Better Localness for Non-Autoregressive Transformer' 的科研主题。它们共同构成独一无二的指纹。

引用此

Wang, S., Huang, H., & Shi, S. (2023). Better Localness for Non-Autoregressive Transformer. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5), 文章 125. https://doi.org/10.1145/3587266