An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Huaqing Zhang, Hongbin Ma*, Ying Jin

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

When the robot uses reinforcement learning (RL) to learn behavior policy, the requirement of RL algorithm is that it can use limited interactive data to learn the relatively optimal policy model. In this paper, we present an off-policy actor-critic deep RL algorithm based on maximum entropy RL framework. In policy improvement step, an off-policy likelihood ratio policy gradient method is derived, where the actions are sampled simultaneously from the current policy model and the experience replay buffer according to the sampled states. This method makes full use of the past experience. Moreover, we design an unified critic network, which can simultaneously approximate the state-value and action-value functions. On a range of continuous control benchmarks, the results show that our method outperforms the state-of-the-art soft actor-critic (SAC) algorithm in stability and asymptotic performance.

源语言英语
主期刊名Intelligent Robotics and Applications - 15th International Conference, ICIRA 2022, Proceedings
编辑Honghai Liu, Weihong Ren, Zhouping Yin, Lianqing Liu, Li Jiang, Guoying Gu, Xinyu Wu
出版商Springer Science and Business Media Deutschland GmbH
449-458
页数10
ISBN(印刷版)9783031138409
DOI
出版状态已出版 - 2022
活动15th International Conference on Intelligent Robotics and Applications, ICIRA 2022 - Harbin, 中国
期限: 1 8月 20223 8月 2022

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
13458 LNAI
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议15th International Conference on Intelligent Robotics and Applications, ICIRA 2022
国家/地区中国
Harbin
时期1/08/223/08/22

指纹

探究 'An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control' 的科研主题。它们共同构成独一无二的指纹。

引用此