Better Localness for Non-Autoregressive Transformer

Shuheng Wang; Heyan Huang; Shumin Shi

doi:10.1145/3587266

Better Localness for Non-Autoregressive Transformer

Shuheng Wang, Heyan Huang, Shumin Shi^*

^*此作品的通讯作者

计算机学院

Nanjing University of Science and Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.

源语言	英语
文章编号	125
期刊	ACM Transactions on Asian and Low-Resource Language Information Processing
卷	22
期	5
DOI	https://doi.org/10.1145/3587266
出版状态	已出版 - 8 5月 2023

访问文件

10.1145/3587266

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, S., Huang, H., & Shi, S. (2023). Better Localness for Non-Autoregressive Transformer. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(5), 文章 125. https://doi.org/10.1145/3587266

@article{244a97c8cef642ed98a07532c9dc3382,

title = "Better Localness for Non-Autoregressive Transformer",

abstract = "The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.",

keywords = "Non-autoregressive, attention module, localness, translation",

author = "Shuheng Wang and Heyan Huang and Shumin Shi",

note = "Publisher Copyright: {\textcopyright} 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.",

year = "2023",

month = may,

day = "8",

doi = "10.1145/3587266",

language = "English",

volume = "22",

journal = "ACM Transactions on Asian and Low-Resource Language Information Processing",

issn = "2375-4699",

publisher = "Association for Computing Machinery (ACM)",

number = "5",

}

TY - JOUR

T1 - Better Localness for Non-Autoregressive Transformer

AU - Wang, Shuheng

AU - Huang, Heyan

AU - Shi, Shumin

PY - 2023/5/8

Y1 - 2023/5/8

N2 - The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.

AB - The Non-Autoregressive Transformer, due to its low inference latency, has attracted much attention from researchers. Although, the performance of the non-autoregressive transformer has been significantly improved in recent years, there is still a gap between the non-autoregressive transformer and the autoregressive transformer. Considering the success of localness on the autoregressive transformer, in this work, we consider incorporating localness into the non-autoregressive transformer. Specifically, we design a dynamic mask matrix according to the query tokens, key tokens, and relative distance, and unify the localness module for self-attention and cross-attention module. We conduct experiments on several benchmark tasks, and the results show that our model can significantly improve the performance of the non-autoregressive transformer.

KW - Non-autoregressive

KW - attention module

KW - localness

KW - translation

UR - http://www.scopus.com/inward/record.url?scp=85162175390&partnerID=8YFLogxK

U2 - 10.1145/3587266

DO - 10.1145/3587266

M3 - Article

AN - SCOPUS:85162175390

SN - 2375-4699

VL - 22

JO - ACM Transactions on Asian and Low-Resource Language Information Processing

JF - ACM Transactions on Asian and Low-Resource Language Information Processing

IS - 5

M1 - 125

ER -

Better Localness for Non-Autoregressive Transformer

摘要

访问文件

其它文件与链接

指纹

引用此