Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Xiao Huang, Xingfang Wang, Yan Zhao, Jiachen Hu, Hui Li, Zhihong Jiang

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. <italic>Note to Practitioners</italic>&#x2014;Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalIEEE Transactions on Automation Science and Engineering
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Reinforcement learning
  • brain-inspired computing
  • model-based policy search
  • motor learning

Fingerprint

Dive into the research topics of 'Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics'. Together they form a unique fingerprint.

Cite this