TY - GEN
T1 - Incorporating Terminology Knowledge into Large Language Model for Domain-Specific Machine Translation
AU - Zhao, Xuan
AU - Feng, Chong
AU - Huang, Shuanghong
AU - Wang, Jiangyu
AU - Xu, Haojie
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Utilizing a small amount of domain knowledge to achieve high-quality domain-specific translation is a challenging task. Nowadays, Large language model(LLM) is capable of generating more fluent and human-preferred translations through personalized instructions. However, in the field of domain-specific machine translation, the performance of LLM is inferior to traditional methods due to the lack of domain training data and the absence of domain transfer ability. To address the issue, we decided to incorporate terminology knowledge, which is crucial for accurately capturing the precise semantics of domain-specific texts. We design two types of terminology alignment instructions to enhance the model’s cross-linguistic terminology alignment capability, explicitly integrating terminology knowledge into the model training process. According to the experiment, the model fine-tuned with MT+G-Align significantly outperformed the baseline through terminology translation accuracy and translation quality, demonstrating the effectiveness of the terminology alignment instructions. On the WMT 2023 Terminology Translation task, experimental results show that our approach achieves the best results in all three directions, including German-to-English, Chinese-to-English, and English-to-Chinese.
AB - Utilizing a small amount of domain knowledge to achieve high-quality domain-specific translation is a challenging task. Nowadays, Large language model(LLM) is capable of generating more fluent and human-preferred translations through personalized instructions. However, in the field of domain-specific machine translation, the performance of LLM is inferior to traditional methods due to the lack of domain training data and the absence of domain transfer ability. To address the issue, we decided to incorporate terminology knowledge, which is crucial for accurately capturing the precise semantics of domain-specific texts. We design two types of terminology alignment instructions to enhance the model’s cross-linguistic terminology alignment capability, explicitly integrating terminology knowledge into the model training process. According to the experiment, the model fine-tuned with MT+G-Align significantly outperformed the baseline through terminology translation accuracy and translation quality, demonstrating the effectiveness of the terminology alignment instructions. On the WMT 2023 Terminology Translation task, experimental results show that our approach achieves the best results in all three directions, including German-to-English, Chinese-to-English, and English-to-Chinese.
KW - Domain-specific translation
KW - Large language model
KW - Terminology alignment instruction
UR - http://www.scopus.com/inward/record.url?scp=86000455558&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-2292-4_6
DO - 10.1007/978-981-96-2292-4_6
M3 - Conference contribution
AN - SCOPUS:86000455558
SN - 9789819622917
T3 - Communications in Computer and Information Science
SP - 82
EP - 96
BT - Machine Translation - 20th China Conference, CCMT 2024, Proceedings
A2 - He, Zhongjun
A2 - Chen, Yidong
PB - Springer Science and Business Media Deutschland GmbH
T2 - 20th China Conference on Machine Translation, CCMT 2024
Y2 - 8 November 2024 through 10 November 2024
ER -