Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Xiao Huang; Xingfang Wang; Yan Zhao; Jiachen Hu; Hui Li; Zhihong Jiang

doi:10.1109/TASE.2024.3352580

Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Xiao Huang, Xingfang Wang, Yan Zhao, Jiachen Hu, Hui Li, Zhihong Jiang

School of Mechatronical Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. <italic>Note to Practitioners</italic>—Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

Original language	English
Pages (from-to)	1-13
Number of pages	13
Journal	IEEE Transactions on Automation Science and Engineering
DOIs	https://doi.org/10.1109/TASE.2024.3352580
Publication status	Accepted/In press - 2024

Keywords

Reinforcement learning
brain-inspired computing
model-based policy search
motor learning

Access to Document

10.1109/TASE.2024.3352580

Cite this

@article{d16bd9775b704179824cdc8725165367,

title = "Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics",

abstract = "Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. Note to Practitioners—Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.",

keywords = "Reinforcement learning, brain-inspired computing, model-based policy search, motor learning",

author = "Xiao Huang and Xingfang Wang and Yan Zhao and Jiachen Hu and Hui Li and Zhihong Jiang",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TASE.2024.3352580",

language = "English",

pages = "1--13",

journal = "IEEE Transactions on Automation Science and Engineering",

issn = "1545-5955",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

AU - Huang, Xiao

AU - Wang, Xingfang

AU - Zhao, Yan

AU - Hu, Jiachen

AU - Li, Hui

AU - Jiang, Zhihong

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. Note to Practitioners—Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

AB - Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency. Note to Practitioners—Reinforcement learning is becoming a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

KW - Reinforcement learning

KW - brain-inspired computing

KW - model-based policy search

KW - motor learning

UR - http://www.scopus.com/inward/record.url?scp=85182925912&partnerID=8YFLogxK

U2 - 10.1109/TASE.2024.3352580

DO - 10.1109/TASE.2024.3352580

M3 - Article

AN - SCOPUS:85182925912

SN - 1545-5955

SP - 1

EP - 13

JO - IEEE Transactions on Automation Science and Engineering

JF - IEEE Transactions on Automation Science and Engineering

ER -

Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this