Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Dengmin Xiao, Bo Wang, Zhongqi Sun, Xiao He

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

源语言英语
主期刊名Proceedings - 2023 China Automation Congress, CAC 2023
出版商Institute of Electrical and Electronics Engineers Inc.
6776-6781
页数6
ISBN(电子版)9798350303759
DOI
出版状态已出版 - 2023
活动2023 China Automation Congress, CAC 2023 - Chongqing, 中国
期限: 17 11月 202319 11月 2023

出版系列

姓名Proceedings - 2023 China Automation Congress, CAC 2023

会议

会议2023 China Automation Congress, CAC 2023
国家/地区中国
Chongqing
时期17/11/2319/11/23

指纹

探究 'Behavioral Cloning Based Model Generation Method for Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此