Incorporating Terminology Knowledge into Large Language Model for Domain-Specific Machine Translation

Xuan Zhao, Chong Feng*, Shuanghong Huang, Jiangyu Wang, Haojie Xu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Utilizing a small amount of domain knowledge to achieve high-quality domain-specific translation is a challenging task. Nowadays, Large language model(LLM) is capable of generating more fluent and human-preferred translations through personalized instructions. However, in the field of domain-specific machine translation, the performance of LLM is inferior to traditional methods due to the lack of domain training data and the absence of domain transfer ability. To address the issue, we decided to incorporate terminology knowledge, which is crucial for accurately capturing the precise semantics of domain-specific texts. We design two types of terminology alignment instructions to enhance the model’s cross-linguistic terminology alignment capability, explicitly integrating terminology knowledge into the model training process. According to the experiment, the model fine-tuned with MT+G-Align significantly outperformed the baseline through terminology translation accuracy and translation quality, demonstrating the effectiveness of the terminology alignment instructions. On the WMT 2023 Terminology Translation task, experimental results show that our approach achieves the best results in all three directions, including German-to-English, Chinese-to-English, and English-to-Chinese.

Original languageEnglish
Title of host publicationMachine Translation - 20th China Conference, CCMT 2024, Proceedings
EditorsZhongjun He, Yidong Chen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages82-96
Number of pages15
ISBN (Print)9789819622917
DOIs
Publication statusPublished - 2025
Event20th China Conference on Machine Translation, CCMT 2024 - Xiamen, China
Duration: 8 Nov 202410 Nov 2024

Publication series

NameCommunications in Computer and Information Science
Volume2365 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference20th China Conference on Machine Translation, CCMT 2024
Country/TerritoryChina
CityXiamen
Period8/11/2410/11/24

Keywords

  • Domain-specific translation
  • Large language model
  • Terminology alignment instruction

Fingerprint

Dive into the research topics of 'Incorporating Terminology Knowledge into Large Language Model for Domain-Specific Machine Translation'. Together they form a unique fingerprint.

Cite this