Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Dengmin Xiao; Bo Wang; Zhongqi Sun; Xiao He

doi:10.1109/CAC59555.2023.10450935

Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Dengmin Xiao, Bo Wang, Zhongqi Sun, Xiao He

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

Original language	English
Title of host publication	Proceedings - 2023 China Automation Congress, CAC 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	6776-6781
Number of pages	6
ISBN (Electronic)	9798350303759
DOIs	https://doi.org/10.1109/CAC59555.2023.10450935
Publication status	Published - 2023
Event	2023 China Automation Congress, CAC 2023 - Chongqing, China Duration: 17 Nov 2023 → 19 Nov 2023

Publication series

Name	Proceedings - 2023 China Automation Congress, CAC 2023

Conference

Conference	2023 China Automation Congress, CAC 2023
Country/Territory	China
City	Chongqing
Period	17/11/23 → 19/11/23

Keywords

accelerated training
behavior cloning
deep reinforcement learning
imitation learning

Access to Document

10.1109/CAC59555.2023.10450935

Cite this

@inproceedings{68738c66736a42b2a24ee4af96873c8e,

title = "Behavioral Cloning Based Model Generation Method for Reinforcement Learning",

abstract = "Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.",

keywords = "accelerated training, behavior cloning, deep reinforcement learning, imitation learning",

author = "Dengmin Xiao and Bo Wang and Zhongqi Sun and Xiao He",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 China Automation Congress, CAC 2023 ; Conference date: 17-11-2023 Through 19-11-2023",

year = "2023",

doi = "10.1109/CAC59555.2023.10450935",

language = "English",

series = "Proceedings - 2023 China Automation Congress, CAC 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6776--6781",

booktitle = "Proceedings - 2023 China Automation Congress, CAC 2023",

address = "United States",

}

Xiao, D, Wang, B , Sun, Z & He, X 2023, Behavioral Cloning Based Model Generation Method for Reinforcement Learning. in Proceedings - 2023 China Automation Congress, CAC 2023. Proceedings - 2023 China Automation Congress, CAC 2023, Institute of Electrical and Electronics Engineers Inc., pp. 6776-6781, 2023 China Automation Congress, CAC 2023, Chongqing, China, 17/11/23. https://doi.org/10.1109/CAC59555.2023.10450935

Behavioral Cloning Based Model Generation Method for Reinforcement Learning. / Xiao, Dengmin; Wang, Bo ; Sun, Zhongqi et al.
Proceedings - 2023 China Automation Congress, CAC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 6776-6781 (Proceedings - 2023 China Automation Congress, CAC 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Behavioral Cloning Based Model Generation Method for Reinforcement Learning

AU - Xiao, Dengmin

AU - Wang, Bo

AU - Sun, Zhongqi

AU - He, Xiao

PY - 2023

Y1 - 2023

N2 - Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

AB - Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

KW - accelerated training

KW - behavior cloning

KW - deep reinforcement learning

KW - imitation learning

UR - http://www.scopus.com/inward/record.url?scp=85189348821&partnerID=8YFLogxK

U2 - 10.1109/CAC59555.2023.10450935

DO - 10.1109/CAC59555.2023.10450935

M3 - Conference contribution

AN - SCOPUS:85189348821

T3 - Proceedings - 2023 China Automation Congress, CAC 2023

SP - 6776

EP - 6781

BT - Proceedings - 2023 China Automation Congress, CAC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 China Automation Congress, CAC 2023

Y2 - 17 November 2023 through 19 November 2023

ER -

Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this