A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation

Zuowei Wang, Xiaozhong Liao, Fengdi Zhang, Min Xu, Yanmin Liu, Xiangdong Liu, Xi Zhang, Rui Wei Dong, Zhen Li

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Stochastic policy-based deep reinforcement learning (DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training stage. When the agent is exposed to more complex environment, not only is the methodology inefficient, but its performance may also suffer from the issue of high variance. This paper develops a framework to accelerate the training procedure and reduce the variance by introducing a stochastic switching network, which specifically allows the agent to choose between heuristic actions and actions output by proximal policy optimization (PPO) algorithm. Instead of starting from the random actions, the agent can be effectively guided by the heuristic actions so that the navigation capability of the agent can be rapidly bootstrapped. The vanilla policy gradient (VPG) algorithm is further utilized to train the switching network, which can be jointly trained with the baseline PPO. By the experimental comparison with the baseline PPO in the customized maze environment with openAI Gym toolkit, our method greatly contributes to the more efficient execution of navigation task by means of the heuristic actions for guidance.

源语言英语
主期刊名Proceedings of the 40th Chinese Control Conference, CCC 2021
编辑Chen Peng, Jian Sun
出版商IEEE Computer Society
3491-3496
页数6
ISBN(电子版)9789881563804
DOI
出版状态已出版 - 26 7月 2021
活动40th Chinese Control Conference, CCC 2021 - Shanghai, 中国
期限: 26 7月 202128 7月 2021

出版系列

姓名Chinese Control Conference, CCC
2021-July
ISSN(印刷版)1934-1768
ISSN(电子版)2161-2927

会议

会议40th Chinese Control Conference, CCC 2021
国家/地区中国
Shanghai
时期26/07/2128/07/21

指纹

探究 'A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation' 的科研主题。它们共同构成独一无二的指纹。

引用此