Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

Tianfu Zhang; Heyan Huang; Chong Feng; Longbing Cao

Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

Tianfu Zhang, Heyan Huang, Chong Feng^*, Longbing Cao

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

10 Citations (Scopus)

Abstract

Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English→German and English→Czech language machine translation validate the RHE effectiveness.

Original language	English
Title of host publication	EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
Publisher	Association for Computational Linguistics (ACL)
Pages	3238-3248
Number of pages	11
ISBN (Electronic)	9781955917094
Publication status	Published - 2021
Event	2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021

Publication series

Name	EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference	2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/Territory	Dominican Republic
City	Virtual, Punta Cana
Period	7/11/21 → 11/11/21

Cite this

Zhang, T., Huang, H., Feng, C., & Cao, L. (2021). Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 3238-3248). (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings). Association for Computational Linguistics (ACL).

Zhang, Tianfu ; Huang, Heyan ; Feng, Chong et al. / Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation. EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. Association for Computational Linguistics (ACL), 2021. pp. 3238-3248 (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).

@inproceedings{2b496b33d5424db89e543a16e1f2703f,

title = "Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation",

abstract = "Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English→German and English→Czech language machine translation validate the RHE effectiveness.",

author = "Tianfu Zhang and Heyan Huang and Chong Feng and Longbing Cao",

note = "Publisher Copyright: {\textcopyright} 2021 Association for Computational Linguistics; 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 ; Conference date: 07-11-2021 Through 11-11-2021",

year = "2021",

language = "English",

series = "EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings",

publisher = "Association for Computational Linguistics (ACL)",

pages = "3238--3248",

booktitle = "EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings",

address = "United States",

}

Zhang, T, Huang, H , Feng, C & Cao, L 2021, Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation. in EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics (ACL), pp. 3238-3248, 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual, Punta Cana, Dominican Republic, 7/11/21.

Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation. / Zhang, Tianfu; Huang, Heyan ; Feng, Chong et al.
EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. Association for Computational Linguistics (ACL), 2021. p. 3238-3248 (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

AU - Zhang, Tianfu

AU - Huang, Heyan

AU - Feng, Chong

AU - Cao, Longbing

PY - 2021

Y1 - 2021

N2 - Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English→German and English→Czech language machine translation validate the RHE effectiveness.

AB - Multi-head self-attention recently attracts enormous interest owing to its specialized functions, significant parallelizable computation, and flexible extensibility. However, very recent empirical studies show that some self-attention heads make little contribution and can be pruned as redundant heads. This work takes a novel perspective of identifying and then vitalizing redundant heads. We propose a redundant head enlivening (RHE) method to precisely identify redundant heads, and then vitalize their potential by learning syntactic relations and prior knowledge in text without sacrificing the roles of important heads. Two novel syntax-enhanced attention (SEA) mechanisms: a dependency mask bias and a relative local-phrasal position bias, are introduced to revise self-attention distributions for syntactic enhancement in machine translation. The importance of individual heads is dynamically evaluated during the redundant heads identification, on which we apply SEA to vitalize redundant heads while maintaining the strength of important heads. Experimental results on WMT14 and WMT16 English→German and English→Czech language machine translation validate the RHE effectiveness.

UR - http://www.scopus.com/inward/record.url?scp=85127423436&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85127423436

T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

SP - 3238

EP - 3248

BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

PB - Association for Computational Linguistics (ACL)

T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021

Y2 - 7 November 2021 through 11 November 2021

ER -

Zhang T, Huang H , Feng C, Cao L. Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. Association for Computational Linguistics (ACL). 2021. p. 3238-3248. (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).

Enlivening Redundant Heads in Multi-head Self-attention for Machine Translation

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this