Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation

Jiarui Zhang, Heyan Huang*, Yue Hu, Ping Guo, Yuqiang Xie

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Multilingual neural machine translation employs a single model to translate multiple languages, enabling efficient cross-lingual transferability through shared parameters. However, multilingual training suffers from negative language interference, especially interference with high-resource languages. Existing approaches generally use language-specific modules to distinguish heterogeneous characteristics among different languages but suffer from the parameter explosion problem. In this paper, we propose a “divide and conquer” multilingual translation training method based on the importance of neurons that can mitigate negative language interference effectively without adding additional parameters. The key technologies can be summarized as estimation, pruning, distillation, and fine-tuning. Specifically, we estimate the importance of existing pre-trained model neurons, dividing them into the important ones representing general knowledge of each language and the unimportant ones representing individual knowledge of each low-resource language. Then, we prune the pre-trained model, retaining only the important neurons, and train the pruned model supervised by the original complete model via selective distillation to compensate for some performance loss due to unstructured pruning. Finally, we restore the pruned neurons and only fine-tune them. Experimental results on several language pairs demonstrate the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationKnowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings
EditorsZhi Jin, Yuncheng Jiang, Wenjun Ma, Robert Andrei Buchmann, Ana-Maria Ghiran, Yaxin Bi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages140-150
Number of pages11
ISBN (Print)9783031402913
DOIs
Publication statusPublished - 2023
EventKnowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings - Guangzhou, China
Duration: 16 Aug 202318 Aug 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14120 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceKnowledge Science, Engineering and Management - 16th International Conference, KSEM 2023, Proceedings
Country/TerritoryChina
CityGuangzhou
Period16/08/2318/08/23

Keywords

  • Importance estimation
  • Multilingual translation
  • Negative language interference
  • Pruning
  • Selective knowledge distillation

Fingerprint

Dive into the research topics of 'Importance-Based Neuron Selective Distillation for Interference Mitigation in Multilingual Neural Machine Translation'. Together they form a unique fingerprint.

Cite this