TY - GEN
T1 - Behavioral Cloning Based Model Generation Method for Reinforcement Learning
AU - Xiao, Dengmin
AU - Wang, Bo
AU - Sun, Zhongqi
AU - He, Xiao
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.
AB - Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.
KW - accelerated training
KW - behavior cloning
KW - deep reinforcement learning
KW - imitation learning
UR - http://www.scopus.com/inward/record.url?scp=85189348821&partnerID=8YFLogxK
U2 - 10.1109/CAC59555.2023.10450935
DO - 10.1109/CAC59555.2023.10450935
M3 - Conference contribution
AN - SCOPUS:85189348821
T3 - Proceedings - 2023 China Automation Congress, CAC 2023
SP - 6776
EP - 6781
BT - Proceedings - 2023 China Automation Congress, CAC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 China Automation Congress, CAC 2023
Y2 - 17 November 2023 through 19 November 2023
ER -