TY - GEN
T1 - Enhancing Zero-Shot Translation in Multilingual Neural Machine Translation
T2 - 33rd International Conference on Artificial Neural Networks, ICANN 2024
AU - Zhang, Jiarui
AU - Huang, Heyan
AU - Hu, Yue
AU - Guo, Ping
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - In the field of multilingual neural machine translation, a notable challenge is zero-shot translation, where a model translates languages that it has not been trained on. This often results in poor translation quality, mainly because the model’s internal language representations are too specific to its training languages. We illustrate that the positional relationship to input tokens is a primary factor contributing to the language-specific representations. We find a solution by modifying the model’s structure, specifically by removing certain connections in its encoder layer. This simple change significantly improves the quality of zero-shot translations, with an increase of up to 11.1 BLEU points, a measure of translation accuracy. Importantly, this improvement does not affect the quality of translations for the languages the model was trained on. Besides, our method facilitates the seamless incorporation of new languages, significantly broadening the scope of translation coverage.
AB - In the field of multilingual neural machine translation, a notable challenge is zero-shot translation, where a model translates languages that it has not been trained on. This often results in poor translation quality, mainly because the model’s internal language representations are too specific to its training languages. We illustrate that the positional relationship to input tokens is a primary factor contributing to the language-specific representations. We find a solution by modifying the model’s structure, specifically by removing certain connections in its encoder layer. This simple change significantly improves the quality of zero-shot translations, with an increase of up to 11.1 BLEU points, a measure of translation accuracy. Importantly, this improvement does not affect the quality of translations for the languages the model was trained on. Besides, our method facilitates the seamless incorporation of new languages, significantly broadening the scope of translation coverage.
KW - location-agnostic representations
KW - multilingual neural machine translation
KW - removing certain connections
KW - zero-shot translation
UR - http://www.scopus.com/inward/record.url?scp=85205310723&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72350-6_13
DO - 10.1007/978-3-031-72350-6_13
M3 - Conference contribution
AN - SCOPUS:85205310723
SN - 9783031723490
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 194
EP - 208
BT - Artificial Neural Networks and Machine Learning – ICANN 2024 - 33rd International Conference on Artificial Neural Networks, Proceedings
A2 - Wand, Michael
A2 - Schmidhuber, Jürgen
A2 - Wand, Michael
A2 - Malinovská, Kristína
A2 - Schmidhuber, Jürgen
A2 - Tetko, Igor V.
A2 - Tetko, Igor V.
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 17 September 2024 through 20 September 2024
ER -