Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Dengmin Xiao, Bo Wang, Zhongqi Sun, Xiao He

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

Original languageEnglish
Title of host publicationProceedings - 2023 China Automation Congress, CAC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6776-6781
Number of pages6
ISBN (Electronic)9798350303759
DOIs
Publication statusPublished - 2023
Event2023 China Automation Congress, CAC 2023 - Chongqing, China
Duration: 17 Nov 202319 Nov 2023

Publication series

NameProceedings - 2023 China Automation Congress, CAC 2023

Conference

Conference2023 China Automation Congress, CAC 2023
Country/TerritoryChina
CityChongqing
Period17/11/2319/11/23

Keywords

  • accelerated training
  • behavior cloning
  • deep reinforcement learning
  • imitation learning

Fingerprint

Dive into the research topics of 'Behavioral Cloning Based Model Generation Method for Reinforcement Learning'. Together they form a unique fingerprint.

Cite this