Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Shan Zhong; Jack Tan; Husheng Dong; Xuemei Chen; Shengrong Gong; Zhenjiang Qian

doi:10.1007/s10723-020-09512-4

Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Shan Zhong^*, Jack Tan, Husheng Dong, Xuemei Chen^*, Shengrong Gong, Zhenjiang Qian

^*Corresponding author for this work

Advanced Research Institute of Multidisciplinary Science

Research output: Contribution to journal › Article › peer-review

11 Citations (Scopus)

Abstract

The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

Original language	English
Pages (from-to)	181-195
Number of pages	15
Journal	Journal of Grid Computing
Volume	18
Issue number	2
DOIs	https://doi.org/10.1007/s10723-020-09512-4
Publication status	Published - 1 Jun 2020

Keywords

Actor-critic
Gaussian process
Linear function approximation
Model learning
Planning

Access to Document

10.1007/s10723-020-09512-4

Cite this

Zhong, S., Tan, J., Dong, H., Chen, X., Gong, S., & Qian, Z. (2020). Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator. Journal of Grid Computing, 18(2), 181-195. https://doi.org/10.1007/s10723-020-09512-4

@article{54f3d387db6041378b9ee367ccfa85b3,

title = "Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator",

abstract = "The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.",

keywords = "Actor-critic, Gaussian process, Linear function approximation, Model learning, Planning",

author = "Shan Zhong and Jack Tan and Husheng Dong and Xuemei Chen and Shengrong Gong and Zhenjiang Qian",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature B.V.",

year = "2020",

month = jun,

day = "1",

doi = "10.1007/s10723-020-09512-4",

language = "English",

volume = "18",

pages = "181--195",

journal = "Journal of Grid Computing",

issn = "1570-7873",

publisher = "Springer Netherlands",

number = "2",

}

TY - JOUR

T1 - Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

AU - Zhong, Shan

AU - Tan, Jack

AU - Dong, Husheng

AU - Chen, Xuemei

AU - Gong, Shengrong

AU - Qian, Zhenjiang

PY - 2020/6/1

Y1 - 2020/6/1

N2 - The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

AB - The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

KW - Actor-critic

KW - Gaussian process

KW - Linear function approximation

KW - Model learning

KW - Planning

UR - http://www.scopus.com/inward/record.url?scp=85084077435&partnerID=8YFLogxK

U2 - 10.1007/s10723-020-09512-4

DO - 10.1007/s10723-020-09512-4

M3 - Article

AN - SCOPUS:85084077435

SN - 1570-7873

VL - 18

SP - 181

EP - 195

JO - Journal of Grid Computing

JF - Journal of Grid Computing

IS - 2

ER -

Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this