Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Dengmin Xiao; Bo Wang; Zhongqi Sun; Xiao He

doi:10.1109/CAC59555.2023.10450935

Behavioral Cloning Based Model Generation Method for Reinforcement Learning

Dengmin Xiao, Bo Wang, Zhongqi Sun, Xiao He

自动化学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

源语言	英语
主期刊名	Proceedings - 2023 China Automation Congress, CAC 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	6776-6781
页数	6
ISBN（电子版）	9798350303759
DOI	https://doi.org/10.1109/CAC59555.2023.10450935
出版状态	已出版 - 2023
活动	2023 China Automation Congress, CAC 2023 - Chongqing, 中国期限: 17 11月 2023 → 19 11月 2023

出版系列

姓名	Proceedings - 2023 China Automation Congress, CAC 2023

会议

会议	2023 China Automation Congress, CAC 2023
国家/地区	中国
市	Chongqing
时期	17/11/23 → 19/11/23

访问文件

10.1109/CAC59555.2023.10450935

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{68738c66736a42b2a24ee4af96873c8e,

title = "Behavioral Cloning Based Model Generation Method for Reinforcement Learning",

abstract = "Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.",

keywords = "accelerated training, behavior cloning, deep reinforcement learning, imitation learning",

author = "Dengmin Xiao and Bo Wang and Zhongqi Sun and Xiao He",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 China Automation Congress, CAC 2023 ; Conference date: 17-11-2023 Through 19-11-2023",

year = "2023",

doi = "10.1109/CAC59555.2023.10450935",

language = "English",

series = "Proceedings - 2023 China Automation Congress, CAC 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "6776--6781",

booktitle = "Proceedings - 2023 China Automation Congress, CAC 2023",

address = "United States",

}

Xiao, D, Wang, B , Sun, Z & He, X 2023, Behavioral Cloning Based Model Generation Method for Reinforcement Learning. 在 Proceedings - 2023 China Automation Congress, CAC 2023. Proceedings - 2023 China Automation Congress, CAC 2023, Institute of Electrical and Electronics Engineers Inc., 页码 6776-6781, 2023 China Automation Congress, CAC 2023, Chongqing, 中国, 17/11/23. https://doi.org/10.1109/CAC59555.2023.10450935

Behavioral Cloning Based Model Generation Method for Reinforcement Learning. / Xiao, Dengmin; Wang, Bo ; Sun, Zhongqi 等.
Proceedings - 2023 China Automation Congress, CAC 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 6776-6781 (Proceedings - 2023 China Automation Congress, CAC 2023).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Behavioral Cloning Based Model Generation Method for Reinforcement Learning

AU - Xiao, Dengmin

AU - Wang, Bo

AU - Sun, Zhongqi

AU - He, Xiao

PY - 2023

Y1 - 2023

N2 - Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

AB - Reinforcement learning (RL) methods that train agents using simulation are well suited to solve behavioral decision-making problems. However, complex simulation platforms that have slow processing speed make RL time-consuming. It is therefore necessary to make full use of expert experience and historical simulation data to avoid training from scratch each time. Considering that the simulation data based on the expert experience are valuable, this paper proposes a new algorithm, which is derived from the behavioral cloning (BC) method, to generate the appropriate model for further RL. The proposed TD-BC algorithm is specifically designed to train policy network and value network simultaneously by using expert experience. We update the policy network by training the model output to be as consistent as possible with the given action of the expert. Then the difference between the value network output of the next moment state and the current moment state is used as the TD error to update the value network. Finally, the subsequent training tasks can be completed through simple fine-tuning with reducing the accumulation time of online learning data and improving the efficiency of the entire training process. The effectiveness of the proposed TD-BC algorithm is validated through the cases with single agent and multiple agents, respectively. In the simulation, we use behavior trees derived from expert experiences to generate historical data. The results show that the TD-BC algorithm can learn expert experience, which provides a high starting point for training and thus accelerate the process of RL.

KW - accelerated training

KW - behavior cloning

KW - deep reinforcement learning

KW - imitation learning

UR - http://www.scopus.com/inward/record.url?scp=85189348821&partnerID=8YFLogxK

U2 - 10.1109/CAC59555.2023.10450935

DO - 10.1109/CAC59555.2023.10450935

M3 - Conference contribution

AN - SCOPUS:85189348821

T3 - Proceedings - 2023 China Automation Congress, CAC 2023

SP - 6776

EP - 6781

BT - Proceedings - 2023 China Automation Congress, CAC 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 China Automation Congress, CAC 2023

Y2 - 17 November 2023 through 19 November 2023

ER -

Behavioral Cloning Based Model Generation Method for Reinforcement Learning

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此