How does Attention Affect the Model?

Cheng Zhang, Qiuchi Li, Lingyu Hua, Dawei Song*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

The attention layer has become a prevalent component in improving the effectiveness of neural network models for NLP tasks. Figuring out why attention is effective and its interpretability has attracted a widespread deliberation. Current studies mostly investigate the effect of attention mechanism based on the attention distribution it generates with one single neural network structure. However they do not consider the changes in semantic capability of different components in the model due to the attention mechanism, which can vary across different network structures. In this paper, we propose a comprehensive analytical framework that exploits a convex hull representation of sequence semantics in an n-dimensional Semantic Euclidean Space and defines a series of indicators to capture the impact of attention on sequence semantics. Through a series of experiments on various NLP tasks and three representative recurrent units, we analyze why and how attention benefits the semantic capacity of different types of recurrent neural networks based on the indicators defined in the proposed framework.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL-IJCNLP 2021
EditorsChengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
PublisherAssociation for Computational Linguistics (ACL)
Pages256-268
Number of pages13
ISBN (Electronic)9781954085541
Publication statusPublished - 2021
EventFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021 - Virtual, Online
Duration: 1 Aug 20216 Aug 2021

Publication series

NameFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Conference

ConferenceFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021
CityVirtual, Online
Period1/08/216/08/21

Fingerprint

Dive into the research topics of 'How does Attention Affect the Model?'. Together they form a unique fingerprint.

Cite this