A goal-conditioned policy search method with multi-timescale value function tuning

Zhihong Jiang; Jiachen Hu; Yan Zhao; Xiao Huang; Hui Li

doi:10.1108/RIA-11-2023-0167

A goal-conditioned policy search method with multi-timescale value function tuning

Zhihong Jiang, Jiachen Hu, Yan Zhao, Xiao Huang^*, Hui Li

^*Corresponding author for this work

School of Mechatronical Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Purpose: Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions. Design/methodology/approach: A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy. Findings: The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms. Originality/value: This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.

Original language	English
Pages (from-to)	549-559
Number of pages	11
Journal	Robotic Intelligence and Automation
Volume	44
Issue number	4
DOIs	https://doi.org/10.1108/RIA-11-2023-0167
Publication status	Published - 18 Jul 2024

Keywords

Bioinspired learning
Goal-conditioned reinforcement learning
Model-based policy search
Multi-timescale value function

Access to Document

10.1108/RIA-11-2023-0167

Cite this

Jiang, Z., Hu, J., Zhao, Y., Huang, X., & Li, H. (2024). A goal-conditioned policy search method with multi-timescale value function tuning. Robotic Intelligence and Automation, 44(4), 549-559. https://doi.org/10.1108/RIA-11-2023-0167

@article{6a1335ffe3974f4f9d544f7c4862c9cd,

title = "A goal-conditioned policy search method with multi-timescale value function tuning",

abstract = "Purpose: Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions. Design/methodology/approach: A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy. Findings: The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms. Originality/value: This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.",

keywords = "Bioinspired learning, Goal-conditioned reinforcement learning, Model-based policy search, Multi-timescale value function",

author = "Zhihong Jiang and Jiachen Hu and Yan Zhao and Xiao Huang and Hui Li",

note = "Publisher Copyright: {\textcopyright} 2024, Emerald Publishing Limited.",

year = "2024",

month = jul,

day = "18",

doi = "10.1108/RIA-11-2023-0167",

language = "English",

volume = "44",

pages = "549--559",

journal = "Robotic Intelligence and Automation",

issn = "2754-6969",

publisher = "Emerald Publishing",

number = "4",

}

TY - JOUR

T1 - A goal-conditioned policy search method with multi-timescale value function tuning

AU - Jiang, Zhihong

AU - Hu, Jiachen

AU - Zhao, Yan

AU - Huang, Xiao

AU - Li, Hui

PY - 2024/7/18

Y1 - 2024/7/18

N2 - Purpose: Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions. Design/methodology/approach: A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy. Findings: The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms. Originality/value: This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.

AB - Purpose: Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions. Design/methodology/approach: A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy. Findings: The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms. Originality/value: This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.

KW - Bioinspired learning

KW - Goal-conditioned reinforcement learning

KW - Model-based policy search

KW - Multi-timescale value function

UR - http://www.scopus.com/inward/record.url?scp=85195491620&partnerID=8YFLogxK

U2 - 10.1108/RIA-11-2023-0167

DO - 10.1108/RIA-11-2023-0167

M3 - Article

AN - SCOPUS:85195491620

SN - 2754-6969

VL - 44

SP - 549

EP - 559

JO - Robotic Intelligence and Automation

JF - Robotic Intelligence and Automation

IS - 4

ER -

A goal-conditioned policy search method with multi-timescale value function tuning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this