TY - GEN
T1 - Teaching Large Language Models to Translate on Low-resource Languages with Textbook Prompting
AU - Guo, Ping
AU - Ren, Yubing
AU - Hu, Yue
AU - Li, Yunpeng
AU - Zhang, Jiarui
AU - Zhang, Xingsheng
AU - Huang, Heyan
N1 - Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
PY - 2024
Y1 - 2024
N2 - Large Language Models (Llms) have achieved impressive results in Machine Translation by simply following instructions, even without training on parallel data. However, Llms still face challenges on low-resource languages due to the lack of pre-training data. In real-world situations, humans can become proficient in their native languages through abundant and meaningful social interactions and can also learn foreign languages effectively using well-organized textbooks. Drawing inspiration from human learning patterns, we introduce the Translate After LEarNing Textbook (Talent) approach, which aims to enhance Llms' ability to translate low-resource languages by learning from a textbook. Talent follows a step-by-step process: (1) Creating a Textbook for low-resource languages. (2) Guiding Llms to absorb the Textbook's content for Syntax Patterns. (3) Enhancing translation by utilizing the Textbook and Syntax Patterns. We thoroughly assess Talent's performance using 112 low-resource languages from FLORES-200 with two Llms: ChatGPT and BLOOMZ. Evaluation across three different metrics reveals that Talent consistently enhances translation performance by 14.8% compared to zero-shot baselines. Further analysis demonstrates that Talent not only improves Llms' comprehension of low-resource languages but also equips them with the knowledge needed to generate accurate and fluent sentences in these languages.
AB - Large Language Models (Llms) have achieved impressive results in Machine Translation by simply following instructions, even without training on parallel data. However, Llms still face challenges on low-resource languages due to the lack of pre-training data. In real-world situations, humans can become proficient in their native languages through abundant and meaningful social interactions and can also learn foreign languages effectively using well-organized textbooks. Drawing inspiration from human learning patterns, we introduce the Translate After LEarNing Textbook (Talent) approach, which aims to enhance Llms' ability to translate low-resource languages by learning from a textbook. Talent follows a step-by-step process: (1) Creating a Textbook for low-resource languages. (2) Guiding Llms to absorb the Textbook's content for Syntax Patterns. (3) Enhancing translation by utilizing the Textbook and Syntax Patterns. We thoroughly assess Talent's performance using 112 low-resource languages from FLORES-200 with two Llms: ChatGPT and BLOOMZ. Evaluation across three different metrics reveals that Talent consistently enhances translation performance by 14.8% compared to zero-shot baselines. Further analysis demonstrates that Talent not only improves Llms' comprehension of low-resource languages but also equips them with the knowledge needed to generate accurate and fluent sentences in these languages.
KW - Large Language Models
KW - Low-resource Language Evaluation
KW - Multilingual Machine Translation
UR - http://www.scopus.com/inward/record.url?scp=85195997831&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85195997831
T3 - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
SP - 15685
EP - 15697
BT - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Kan, Min-Yen
A2 - Hoste, Veronique
A2 - Lenci, Alessandro
A2 - Sakti, Sakriani
A2 - Xue, Nianwen
PB - European Language Resources Association (ELRA)
T2 - Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Y2 - 20 May 2024 through 25 May 2024
ER -