TY - JOUR
T1 - OSTOD
T2 - One-Step Task-Oriented Dialogue with activated state and retelling response
AU - Huang, Heyan
AU - Yang, Puhai
AU - Wei, Wei
AU - Shi, Shumin
AU - Mao, Xian Ling
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/6/7
Y1 - 2024/6/7
N2 - As of present, the progress of conversational AI research has been greatly propelled by large-scale pre-trained language models. In particular, task-oriented dialogue systems have gained widespread attention owing to their immense potential in helping individuals accomplish diverse objectives, such as booking hotels, making restaurant reservations, and purchasing train tickets. In the past, task-oriented dialogue systems were typically viewed as a multi-step process that included spoken language understanding, dialogue state tracking, dialogue policy learning, and natural language generation. More recently, large-scale pre-trained language models enables the development of end-to-end neural pipeline task-oriented dialogue systems, which combine multiple steps into a single model, allowing for joint optimization and preventing error propagation. However, in order to explicitly retrieve information from databases to ensure the interpretability of the system, almost all end-to-end neural pipeline methods inevitably require predicting dialogue state as an intermediate result specialized for the domain or task, which results in significant challenges for generalization. To solve the problem above, we propose One-Step Task-Oriented Dialogue (OSTOD) in this paper, which models task-oriented dialogue by synchronously generating activated states and retelling responses, where activated states refer to slot values that contribute to database access, and retelling responses are system responses that contain activated state information. Specifically, first, automatic methods are designed to build data containing activated states and retelling responses. Then, a joint generation model that synchronously predicts activated states and retelling responses in a single step is proposed for task-oriented dialogue modelling. Based on empirical results obtained from the MultiWOZ 2.0 and MultiWOZ 2.1 datasets, our OSTOD model demonstrates comparable performance to state-of-the-art baselines. Moreover, our model exhibits exceptional generalization capabilities in few-shot learning and domain transfer scenarios.
AB - As of present, the progress of conversational AI research has been greatly propelled by large-scale pre-trained language models. In particular, task-oriented dialogue systems have gained widespread attention owing to their immense potential in helping individuals accomplish diverse objectives, such as booking hotels, making restaurant reservations, and purchasing train tickets. In the past, task-oriented dialogue systems were typically viewed as a multi-step process that included spoken language understanding, dialogue state tracking, dialogue policy learning, and natural language generation. More recently, large-scale pre-trained language models enables the development of end-to-end neural pipeline task-oriented dialogue systems, which combine multiple steps into a single model, allowing for joint optimization and preventing error propagation. However, in order to explicitly retrieve information from databases to ensure the interpretability of the system, almost all end-to-end neural pipeline methods inevitably require predicting dialogue state as an intermediate result specialized for the domain or task, which results in significant challenges for generalization. To solve the problem above, we propose One-Step Task-Oriented Dialogue (OSTOD) in this paper, which models task-oriented dialogue by synchronously generating activated states and retelling responses, where activated states refer to slot values that contribute to database access, and retelling responses are system responses that contain activated state information. Specifically, first, automatic methods are designed to build data containing activated states and retelling responses. Then, a joint generation model that synchronously predicts activated states and retelling responses in a single step is proposed for task-oriented dialogue modelling. Based on empirical results obtained from the MultiWOZ 2.0 and MultiWOZ 2.1 datasets, our OSTOD model demonstrates comparable performance to state-of-the-art baselines. Moreover, our model exhibits exceptional generalization capabilities in few-shot learning and domain transfer scenarios.
KW - Dialogue state tracking
KW - End-to-end dialogue
KW - Response generation
KW - Retelling response
KW - Task-oriented dialogue
UR - http://www.scopus.com/inward/record.url?scp=85189671986&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.111677
DO - 10.1016/j.knosys.2024.111677
M3 - Article
AN - SCOPUS:85189671986
SN - 0950-7051
VL - 293
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 111677
ER -