TY - JOUR
T1 - Efficient and Effective Role Player
T2 - A Compact Knowledge-grounded Persona-based Dialogue Model Enhanced by LLM Distillation
AU - Hu, Linmei
AU - Zhang, Xinyu
AU - Song, Dandan
AU - Zhou, Changzhi
AU - He, Hongyu
AU - Nie, Liqiang
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/2/27
Y1 - 2025/2/27
N2 - Incorporating explicit personas into dialogue models is critical for generating responses that fulfill specific user needs and preferences, creating a more personalized and engaging interaction. Early works on persona-based dialogue generation directly concatenate the persona descriptions and dialogue history into relatively small pre-trained language models (PLMs) for response generation, which leads to uninformative and inferior results due to the sparse persona information and the limited model generation capabilities. Recently, large language models (LLMs) have shown their surprising capabilities in language generation. Prompting the LLMs with the persona descriptions for role-playing dialogue generation has also achieved promising results. However, deploying LLMs is challenging for practical applications due to their large scale, spurring efforts to distill the generation capabilities into more concise and compact models through teacher-student learning. In this article, we propose an efficient compact Knowledge-grounded Persona-based Dialogue model enhanced by LLM Distillation (KPDD). Specifically, first, we propose to enrich the annotated persona descriptions by integrating external knowledge graphs (KGs) with a mixed encoding network, coupled with a mixture of experts (MoE) module for both informative and diverse response generation. The mixed encoding network contains multiple layers of modality interaction operations, enabling information from both modalities propagates to the other. Second, to fully exploit the generation capabilities of LLMs, we turn to the distillation technique to improve the generation capabilities of our model, facilitated by a natural language inference (NLI)-based filtering mechanism to extract high-quality information from LLMs. In addition, we employ a curriculum learning strategy to train our model on the high-quality filtered distilled data and progressively on the relatively noisy original data, enhancing its adaptability and performance. Extensive experiments show that KPDD outperforms state-of-the-art baselines in terms of both automatic and human evaluation.
AB - Incorporating explicit personas into dialogue models is critical for generating responses that fulfill specific user needs and preferences, creating a more personalized and engaging interaction. Early works on persona-based dialogue generation directly concatenate the persona descriptions and dialogue history into relatively small pre-trained language models (PLMs) for response generation, which leads to uninformative and inferior results due to the sparse persona information and the limited model generation capabilities. Recently, large language models (LLMs) have shown their surprising capabilities in language generation. Prompting the LLMs with the persona descriptions for role-playing dialogue generation has also achieved promising results. However, deploying LLMs is challenging for practical applications due to their large scale, spurring efforts to distill the generation capabilities into more concise and compact models through teacher-student learning. In this article, we propose an efficient compact Knowledge-grounded Persona-based Dialogue model enhanced by LLM Distillation (KPDD). Specifically, first, we propose to enrich the annotated persona descriptions by integrating external knowledge graphs (KGs) with a mixed encoding network, coupled with a mixture of experts (MoE) module for both informative and diverse response generation. The mixed encoding network contains multiple layers of modality interaction operations, enabling information from both modalities propagates to the other. Second, to fully exploit the generation capabilities of LLMs, we turn to the distillation technique to improve the generation capabilities of our model, facilitated by a natural language inference (NLI)-based filtering mechanism to extract high-quality information from LLMs. In addition, we employ a curriculum learning strategy to train our model on the high-quality filtered distilled data and progressively on the relatively noisy original data, enhancing its adaptability and performance. Extensive experiments show that KPDD outperforms state-of-the-art baselines in terms of both automatic and human evaluation.
KW - Curriculum Learning
KW - Distillation
KW - Knowledge Graph
KW - Large Language Model
KW - MoE
KW - Persona-based Dialogue Generation
UR - http://www.scopus.com/inward/record.url?scp=105005637895&partnerID=8YFLogxK
U2 - 10.1145/3711857
DO - 10.1145/3711857
M3 - Article
AN - SCOPUS:105005637895
SN - 1046-8188
VL - 43
JO - ACM Transactions on Information Systems
JF - ACM Transactions on Information Systems
IS - 3
M1 - 59
ER -