Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation

Jiarui Zhang; Heyan Huang; Yue Hu; Ping Guo; Yuqiang Xie

doi:10.1007/978-3-031-40292-0_12

Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation

Jiarui Zhang, Heyan Huang^*, Yue Hu, Ping Guo, Yuqiang Xie

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Multilingual neural machine translation employs a single model to translate multiple languages, enabling efficient cross-lingual transferability through shared parameters. However, multilingual training suffers from negative language interference, especially interference with high-resource languages. Existing approaches generally use language-specific modules to distinguish heterogeneous characteristics among different languages but suffer from the parameter explosion problem. In this paper, we propose a “divide and conquer” multilingual translation training method based on the importance of neurons that can mitigate negative language interference effectively without adding additional parameters. The key technologies can be summarized as estimation, pruning, distillation, and fine-tuning. Specifically, we estimate the importance of existing pre-trained model neurons, dividing them into the important ones representing general knowledge of each language and the unimportant ones representing individual knowledge of each low-resource language. Then, we prune the pre-trained model, retaining only the important neurons, and train the pruned model supervised by the original complete model via selective distillation to compensate for some performance loss due to unstructured pruning. Finally, we restore the pruned neurons and only fine-tune them. Experimental results on several language pairs demonstrate the effectiveness of the proposed method.

Original language	English
Title of host publication	Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings
Editors	Zhi Jin, Yuncheng Jiang, Wenjun Ma, Robert Andrei Buchmann, Ana-Maria Ghiran, Yaxin Bi
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	140-150
Number of pages	11
ISBN (Print)	9783031402913
DOIs	https://doi.org/10.1007/978-3-031-40292-0_12
Publication status	Published - 2023
Event	Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings - Guangzhou, China Duration: 16 Aug 2023 → 18 Aug 2023

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	14120 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings
Country/Territory	China
City	Guangzhou
Period	16/08/23 → 18/08/23

Keywords

Importance estimation
Multilingual translation
Negative language interference
Pruning
Selective knowledge distillation

Access to Document

10.1007/978-3-031-40292-0_12

Cite this

Zhang, J., Huang, H., Hu, Y., Guo, P., & Xie, Y. (2023). Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation. In Z. Jin, Y. Jiang, W. Ma, R. A. Buchmann, A.-M. Ghiran, & Y. Bi (Eds.), Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings (pp. 140-150). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14120 LNAI). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-40292-0_12

Zhang, Jiarui ; Huang, Heyan ; Hu, Yue et al. / Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation. Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings. editor / Zhi Jin ; Yuncheng Jiang ; Wenjun Ma ; Robert Andrei Buchmann ; Ana-Maria Ghiran ; Yaxin Bi. Springer Science and Business Media Deutschland GmbH, 2023. pp. 140-150 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{1d2dc672992a41239e6eaea0f260cba3,

title = "Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation",

abstract = "Multilingual neural machine translation employs a single model to translate multiple languages, enabling efficient cross-lingual transferability through shared parameters. However, multilingual training suffers from negative language interference, especially interference with high-resource languages. Existing approaches generally use language-specific modules to distinguish heterogeneous characteristics among different languages but suffer from the parameter explosion problem. In this paper, we propose a “divide and conquer” multilingual translation training method based on the importance of neurons that can mitigate negative language interference effectively without adding additional parameters. The key technologies can be summarized as estimation, pruning, distillation, and fine-tuning. Specifically, we estimate the importance of existing pre-trained model neurons, dividing them into the important ones representing general knowledge of each language and the unimportant ones representing individual knowledge of each low-resource language. Then, we prune the pre-trained model, retaining only the important neurons, and train the pruned model supervised by the original complete model via selective distillation to compensate for some performance loss due to unstructured pruning. Finally, we restore the pruned neurons and only fine-tune them. Experimental results on several language pairs demonstrate the effectiveness of the proposed method.",

keywords = "Importance estimation, Multilingual translation, Negative language interference, Pruning, Selective knowledge distillation",

author = "Jiarui Zhang and Heyan Huang and Yue Hu and Ping Guo and Yuqiang Xie",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings ; Conference date: 16-08-2023 Through 18-08-2023",

year = "2023",

doi = "10.1007/978-3-031-40292-0_12",

language = "English",

isbn = "9783031402913",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "140--150",

editor = "Zhi Jin and Yuncheng Jiang and Wenjun Ma and Buchmann, {Robert Andrei} and Ana-Maria Ghiran and Yaxin Bi",

booktitle = "Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings",

address = "Germany",

}

Zhang, J, Huang, H, Hu, Y, Guo, P & Xie, Y 2023, Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation. in Z Jin, Y Jiang, W Ma, RA Buchmann, A-M Ghiran & Y Bi (eds), Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14120 LNAI, Springer Science and Business Media Deutschland GmbH, pp. 140-150, Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings, Guangzhou, China, 16/08/23. https://doi.org/10.1007/978-3-031-40292-0_12

Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation. / Zhang, Jiarui; Huang, Heyan; Hu, Yue et al.
Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings. ed. / Zhi Jin; Yuncheng Jiang; Wenjun Ma; Robert Andrei Buchmann; Ana-Maria Ghiran; Yaxin Bi. Springer Science and Business Media Deutschland GmbH, 2023. p. 140-150 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14120 LNAI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation

AU - Zhang, Jiarui

AU - Huang, Heyan

AU - Hu, Yue

AU - Guo, Ping

AU - Xie, Yuqiang

PY - 2023

Y1 - 2023

N2 - Multilingual neural machine translation employs a single model to translate multiple languages, enabling efficient cross-lingual transferability through shared parameters. However, multilingual training suffers from negative language interference, especially interference with high-resource languages. Existing approaches generally use language-specific modules to distinguish heterogeneous characteristics among different languages but suffer from the parameter explosion problem. In this paper, we propose a “divide and conquer” multilingual translation training method based on the importance of neurons that can mitigate negative language interference effectively without adding additional parameters. The key technologies can be summarized as estimation, pruning, distillation, and fine-tuning. Specifically, we estimate the importance of existing pre-trained model neurons, dividing them into the important ones representing general knowledge of each language and the unimportant ones representing individual knowledge of each low-resource language. Then, we prune the pre-trained model, retaining only the important neurons, and train the pruned model supervised by the original complete model via selective distillation to compensate for some performance loss due to unstructured pruning. Finally, we restore the pruned neurons and only fine-tune them. Experimental results on several language pairs demonstrate the effectiveness of the proposed method.

AB - Multilingual neural machine translation employs a single model to translate multiple languages, enabling efficient cross-lingual transferability through shared parameters. However, multilingual training suffers from negative language interference, especially interference with high-resource languages. Existing approaches generally use language-specific modules to distinguish heterogeneous characteristics among different languages but suffer from the parameter explosion problem. In this paper, we propose a “divide and conquer” multilingual translation training method based on the importance of neurons that can mitigate negative language interference effectively without adding additional parameters. The key technologies can be summarized as estimation, pruning, distillation, and fine-tuning. Specifically, we estimate the importance of existing pre-trained model neurons, dividing them into the important ones representing general knowledge of each language and the unimportant ones representing individual knowledge of each low-resource language. Then, we prune the pre-trained model, retaining only the important neurons, and train the pruned model supervised by the original complete model via selective distillation to compensate for some performance loss due to unstructured pruning. Finally, we restore the pruned neurons and only fine-tune them. Experimental results on several language pairs demonstrate the effectiveness of the proposed method.

KW - Importance estimation

KW - Multilingual translation

KW - Negative language interference

KW - Pruning

KW - Selective knowledge distillation

UR - http://www.scopus.com/inward/record.url?scp=85173063376&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-40292-0_12

DO - 10.1007/978-3-031-40292-0_12

M3 - Conference contribution

AN - SCOPUS:85173063376

SN - 9783031402913

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 140

EP - 150

BT - Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings

A2 - Jin, Zhi

A2 - Jiang, Yuncheng

A2 - Ma, Wenjun

A2 - Buchmann, Robert Andrei

A2 - Ghiran, Ana-Maria

A2 - Bi, Yaxin

PB - Springer Science and Business Media Deutschland GmbH

T2 - Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings

Y2 - 16 August 2023 through 18 August 2023

ER -

Zhang J, Huang H, Hu Y, Guo P, Xie Y. Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation. In Jin Z, Jiang Y, Ma W, Buchmann RA, Ghiran AM, Bi Y, editors, Knowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 140-150. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-40292-0_12