An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Huaqing Zhang, Hongbin Ma*, Ying Jin

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

When the robot uses reinforcement learning (RL) to learn behavior policy, the requirement of RL algorithm is that it can use limited interactive data to learn the relatively optimal policy model. In this paper, we present an off-policy actor-critic deep RL algorithm based on maximum entropy RL framework. In policy improvement step, an off-policy likelihood ratio policy gradient method is derived, where the actions are sampled simultaneously from the current policy model and the experience replay buffer according to the sampled states. This method makes full use of the past experience. Moreover, we design an unified critic network, which can simultaneously approximate the state-value and action-value functions. On a range of continuous control benchmarks, the results show that our method outperforms the state-of-the-art soft actor-critic (SAC) algorithm in stability and asymptotic performance.

Original languageEnglish
Title of host publicationIntelligent Robotics and Applications - 15th International Conference, ICIRA 2022, Proceedings
EditorsHonghai Liu, Weihong Ren, Zhouping Yin, Lianqing Liu, Li Jiang, Guoying Gu, Xinyu Wu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages449-458
Number of pages10
ISBN (Print)9783031138409
DOIs
Publication statusPublished - 2022
Event15th International Conference on Intelligent Robotics and Applications, ICIRA 2022 - Harbin, China
Duration: 1 Aug 20223 Aug 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13458 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Intelligent Robotics and Applications, ICIRA 2022
Country/TerritoryChina
CityHarbin
Period1/08/223/08/22

Keywords

  • A unified critic network
  • Deep reinforcement learning
  • Robotic control

Fingerprint

Dive into the research topics of 'An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control'. Together they form a unique fingerprint.

Cite this