Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Xiao Huang; Xingfang Wang; Yan Zhao; Jiachen Hu; Hui Li; Zhihong Jiang

doi:10.1109/TASE.2024.3352580

Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

Xiao Huang, Xingfang Wang, Yan Zhao, Jiachen Hu, Hui Li^*, Zhihong Jiang^*

^*此作品的通讯作者

机电学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency.a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

源语言	英语
页（从-至）	453-465
页数	13
期刊	IEEE Transactions on Automation Science and Engineering
卷	22
DOI	https://doi.org/10.1109/TASE.2024.3352580
出版状态	已出版 - 2025

访问文件

10.1109/TASE.2024.3352580

其它文件与链接

链接到 Scopus 的出版物

引用此

Huang, X., Wang, X., Zhao, Y., Hu, J., Li, H., & Jiang, Z. (2025). Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics. IEEE Transactions on Automation Science and Engineering, 22, 453-465. https://doi.org/10.1109/TASE.2024.3352580

@article{d16bd9775b704179824cdc8725165367,

title = "Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics",

abstract = "Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency.a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.",

keywords = "Reinforcement learning, brain-inspired computing, model-based policy search, motor learning",

author = "Xiao Huang and Xingfang Wang and Yan Zhao and Jiachen Hu and Hui Li and Zhihong Jiang",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2025",

doi = "10.1109/TASE.2024.3352580",

language = "English",

volume = "22",

pages = "453--465",

journal = "IEEE Transactions on Automation Science and Engineering",

issn = "1545-5955",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

AU - Huang, Xiao

AU - Wang, Xingfang

AU - Zhao, Yan

AU - Hu, Jiachen

AU - Li, Hui

AU - Jiang, Zhihong

PY - 2025

Y1 - 2025

N2 - Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency.a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

AB - Reinforcement learning recently has achieved impressive success in allowing robots to learn complex motor skills in simulation environments. However, most of these successes are difficult to transfer to physical robots since current algorithms require lots of practical training and complex sim-to-real transfer skills. To improve the learning efficiency and adaptability of physical robots, this article proposes a guided model-based policy search (GMBPS) algorithm inspired by a hypothetical model-free (MF) and model-based (MB) actor-critic brain implementation. This approach bridges the gap between MF and MB control processes, overcoming the suboptimality of MB methods and speeding up the learning rate of MF methods. Additionally, a one-step predictive control framework is proposed for minimizing the impact of delayed sensorimotor information in real-world tasks. This helps to accurately control the action cycle time and ensures the feasibility of MB planning for physical robots. The simulation and experimental results demonstrate that the proposed approach enables a 6-DOF UR5e robot arm to learn various reaching tasks in a few minutes with better policies and higher learning efficiency.a popular framework that allows robots to learn complex motor skills without building analytical models of controlled plants. However, low learning efficiency severely limits its application in practical robots, where robots have to quickly adapt to dynamically changing environments in micro-data situations. To solve the inefficiency problem of physical robot learning from scratch, this paper proposes a MF and MB fusion control algorithm inspired by a hypothetical MF and MB actor-critic brain implementation. The motion decision process is modeled as an optimization problem with inequality constraints. The global MF value function is incorporated into the MB objective function, extending the short-term optimization into a long-term version to overcome the suboptimality of conventional MB methods. The MB policy is searched based on the quadratic penalty method with the guide of the MF policy, which helps improve the quality of policy at every decision-making step. Moreover, since the model dynamics is fitted by a probabilistic neural network, the proposed method is not only applicable to joint-driven robots but also provides a feasible solution for the control of various robotic systems with complex dynamics, such as soft robots and musculoskeletal robots.

KW - Reinforcement learning

KW - brain-inspired computing

KW - model-based policy search

KW - motor learning

UR - http://www.scopus.com/inward/record.url?scp=85182925912&partnerID=8YFLogxK

U2 - 10.1109/TASE.2024.3352580

DO - 10.1109/TASE.2024.3352580

M3 - Article

AN - SCOPUS:85182925912

SN - 1545-5955

VL - 22

SP - 453

EP - 465

JO - IEEE Transactions on Automation Science and Engineering

JF - IEEE Transactions on Automation Science and Engineering

ER -

Guided Model-Based Policy Search Method for Fast Motor Learning of Robots With Learned Dynamics

摘要

访问文件

其它文件与链接

指纹

引用此