A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation

Zuowei Wang; Xiaozhong Liao; Fengdi Zhang; Min Xu; Yanmin Liu; Xiangdong Liu; Xi Zhang; Rui Wei Dong; Zhen Li

doi:10.23919/CCC52363.2021.9549631

A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation

Zuowei Wang, Xiaozhong Liao, Fengdi Zhang, Min Xu, Yanmin Liu, Xiangdong Liu, Xi Zhang, Rui Wei Dong, Zhen Li

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Stochastic policy-based deep reinforcement learning (DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training stage. When the agent is exposed to more complex environment, not only is the methodology inefficient, but its performance may also suffer from the issue of high variance. This paper develops a framework to accelerate the training procedure and reduce the variance by introducing a stochastic switching network, which specifically allows the agent to choose between heuristic actions and actions output by proximal policy optimization (PPO) algorithm. Instead of starting from the random actions, the agent can be effectively guided by the heuristic actions so that the navigation capability of the agent can be rapidly bootstrapped. The vanilla policy gradient (VPG) algorithm is further utilized to train the switching network, which can be jointly trained with the baseline PPO. By the experimental comparison with the baseline PPO in the customized maze environment with openAI Gym toolkit, our method greatly contributes to the more efficient execution of navigation task by means of the heuristic actions for guidance.

源语言	英语
主期刊名	Proceedings of the 40th Chinese Control Conference, CCC 2021
编辑	Chen Peng, Jian Sun
出版商	IEEE Computer Society
页	3491-3496
页数	6
ISBN（电子版）	9789881563804
DOI	https://doi.org/10.23919/CCC52363.2021.9549631
出版状态	已出版 - 26 7月 2021
活动	40th Chinese Control Conference, CCC 2021 - Shanghai, 中国期限: 26 7月 2021 → 28 7月 2021

出版系列

姓名	Chinese Control Conference, CCC
卷	2021-July
ISSN（印刷版）	1934-1768
ISSN（电子版）	2161-2927

会议

会议	40th Chinese Control Conference, CCC 2021
国家/地区	中国
市	Shanghai
时期	26/07/21 → 28/07/21

访问文件

10.23919/CCC52363.2021.9549631

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, Z., Liao, X., Zhang, F., Xu, M., Liu, Y., Liu, X., Zhang, X., Dong, R. W., & Li, Z. (2021). A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation. 在 C. Peng, & J. Sun (编辑), Proceedings of the 40th Chinese Control Conference, CCC 2021 (页码 3491-3496). (Chinese Control Conference, CCC; 卷 2021-July). IEEE Computer Society. https://doi.org/10.23919/CCC52363.2021.9549631

@inproceedings{b18341945ac24c2dbf43b901d1d923e7,

title = "A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation",

abstract = "Stochastic policy-based deep reinforcement learning (DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training stage. When the agent is exposed to more complex environment, not only is the methodology inefficient, but its performance may also suffer from the issue of high variance. This paper develops a framework to accelerate the training procedure and reduce the variance by introducing a stochastic switching network, which specifically allows the agent to choose between heuristic actions and actions output by proximal policy optimization (PPO) algorithm. Instead of starting from the random actions, the agent can be effectively guided by the heuristic actions so that the navigation capability of the agent can be rapidly bootstrapped. The vanilla policy gradient (VPG) algorithm is further utilized to train the switching network, which can be jointly trained with the baseline PPO. By the experimental comparison with the baseline PPO in the customized maze environment with openAI Gym toolkit, our method greatly contributes to the more efficient execution of navigation task by means of the heuristic actions for guidance.",

keywords = "Robot navigation, Vanilla policy gradient (VPG), action switching, deep reinforcement learning (DRL), proximal policy optimization (PPO)",

author = "Zuowei Wang and Xiaozhong Liao and Fengdi Zhang and Min Xu and Yanmin Liu and Xiangdong Liu and Xi Zhang and Dong, {Rui Wei} and Zhen Li",

note = "Publisher Copyright: {\textcopyright} 2021 Technical Committee on Control Theory, Chinese Association of Automation.; 40th Chinese Control Conference, CCC 2021 ; Conference date: 26-07-2021 Through 28-07-2021",

year = "2021",

month = jul,

day = "26",

doi = "10.23919/CCC52363.2021.9549631",

language = "English",

series = "Chinese Control Conference, CCC",

publisher = "IEEE Computer Society",

pages = "3491--3496",

editor = "Chen Peng and Jian Sun",

booktitle = "Proceedings of the 40th Chinese Control Conference, CCC 2021",

address = "United States",

}

Wang, Z, Liao, X, Zhang, F, Xu, M, Liu, Y, Liu, X, Zhang, X, Dong, RW & Li, Z 2021, A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation. 在 C Peng & J Sun (编辑), Proceedings of the 40th Chinese Control Conference, CCC 2021. Chinese Control Conference, CCC, 卷 2021-July, IEEE Computer Society, 页码 3491-3496, 40th Chinese Control Conference, CCC 2021, Shanghai, 中国, 26/07/21. https://doi.org/10.23919/CCC52363.2021.9549631

A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation. / Wang, Zuowei; Liao, Xiaozhong; Zhang, Fengdi 等.
Proceedings of the 40th Chinese Control Conference, CCC 2021. 编辑 / Chen Peng; Jian Sun. IEEE Computer Society, 2021. 页码 3491-3496 (Chinese Control Conference, CCC; 卷 2021-July).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation

AU - Wang, Zuowei

AU - Liao, Xiaozhong

AU - Zhang, Fengdi

AU - Xu, Min

AU - Liu, Yanmin

AU - Liu, Xiangdong

AU - Zhang, Xi

AU - Dong, Rui Wei

AU - Li, Zhen

PY - 2021/7/26

Y1 - 2021/7/26

N2 - Stochastic policy-based deep reinforcement learning (DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training stage. When the agent is exposed to more complex environment, not only is the methodology inefficient, but its performance may also suffer from the issue of high variance. This paper develops a framework to accelerate the training procedure and reduce the variance by introducing a stochastic switching network, which specifically allows the agent to choose between heuristic actions and actions output by proximal policy optimization (PPO) algorithm. Instead of starting from the random actions, the agent can be effectively guided by the heuristic actions so that the navigation capability of the agent can be rapidly bootstrapped. The vanilla policy gradient (VPG) algorithm is further utilized to train the switching network, which can be jointly trained with the baseline PPO. By the experimental comparison with the baseline PPO in the customized maze environment with openAI Gym toolkit, our method greatly contributes to the more efficient execution of navigation task by means of the heuristic actions for guidance.

AB - Stochastic policy-based deep reinforcement learning (DRL) has successfully gained the widespread application but demands plenty of stochastic exploration to learn the environment at the initial training stage. When the agent is exposed to more complex environment, not only is the methodology inefficient, but its performance may also suffer from the issue of high variance. This paper develops a framework to accelerate the training procedure and reduce the variance by introducing a stochastic switching network, which specifically allows the agent to choose between heuristic actions and actions output by proximal policy optimization (PPO) algorithm. Instead of starting from the random actions, the agent can be effectively guided by the heuristic actions so that the navigation capability of the agent can be rapidly bootstrapped. The vanilla policy gradient (VPG) algorithm is further utilized to train the switching network, which can be jointly trained with the baseline PPO. By the experimental comparison with the baseline PPO in the customized maze environment with openAI Gym toolkit, our method greatly contributes to the more efficient execution of navigation task by means of the heuristic actions for guidance.

KW - Robot navigation

KW - Vanilla policy gradient (VPG)

KW - action switching

KW - deep reinforcement learning (DRL)

KW - proximal policy optimization (PPO)

UR - http://www.scopus.com/inward/record.url?scp=85117266141&partnerID=8YFLogxK

U2 - 10.23919/CCC52363.2021.9549631

DO - 10.23919/CCC52363.2021.9549631

M3 - Conference contribution

AN - SCOPUS:85117266141

T3 - Chinese Control Conference, CCC

SP - 3491

EP - 3496

BT - Proceedings of the 40th Chinese Control Conference, CCC 2021

A2 - Peng, Chen

A2 - Sun, Jian

PB - IEEE Computer Society

T2 - 40th Chinese Control Conference, CCC 2021

Y2 - 26 July 2021 through 28 July 2021

ER -

A Deep Reinforcement Learning Method with Action Switching for Autonomous Navigation

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此