TY - GEN
T1 - Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning
AU - Zhao, Sen
AU - Wei, Wei
AU - Liu, Yifan
AU - Wang, Ziyang
AU - Li, Wendi
AU - Mao, Xian Ling
AU - Zhu, Shuai
AU - Yang, Minghui
AU - Wen, Zujie
N1 - Publisher Copyright:
© 2023 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) that satisfy user preferences and give feedback to estimate the effectiveness of the director's option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director's option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
AB - Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) that satisfy user preferences and give feedback to estimate the effectiveness of the director's option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director's option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
UR - https://www.scopus.com/pages/publications/85170393886
U2 - 10.24963/ijcai.2023/273
DO - 10.24963/ijcai.2023/273
M3 - Conference contribution
AN - SCOPUS:85170393886
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 2459
EP - 2467
BT - Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
A2 - Elkind, Edith
PB - International Joint Conferences on Artificial Intelligence
T2 - 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Y2 - 19 August 2023 through 25 August 2023
ER -