Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Shan Zhong; Jack Tan; Husheng Dong; Xuemei Chen; Shengrong Gong; Zhenjiang Qian

doi:10.1007/s10723-020-09512-4

Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Shan Zhong^*, Jack Tan, Husheng Dong, Xuemei Chen^*, Shengrong Gong, Zhenjiang Qian

^*此作品的通讯作者

前沿交叉科学研究院

科研成果: 期刊稿件 › 文章 › 同行评审

11 引用（Scopus）

摘要

The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

源语言	英语
页（从-至）	181-195
页数	15
期刊	Journal of Grid Computing
卷	18
期	2
DOI	https://doi.org/10.1007/s10723-020-09512-4
出版状态	已出版 - 1 6月 2020

访问文件

10.1007/s10723-020-09512-4

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhong, S., Tan, J., Dong, H., Chen, X., Gong, S., & Qian, Z. (2020). Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator. Journal of Grid Computing, 18(2), 181-195. https://doi.org/10.1007/s10723-020-09512-4

@article{54f3d387db6041378b9ee367ccfa85b3,

title = "Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator",

abstract = "The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.",

keywords = "Actor-critic, Gaussian process, Linear function approximation, Model learning, Planning",

author = "Shan Zhong and Jack Tan and Husheng Dong and Xuemei Chen and Shengrong Gong and Zhenjiang Qian",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature B.V.",

year = "2020",

month = jun,

day = "1",

doi = "10.1007/s10723-020-09512-4",

language = "English",

volume = "18",

pages = "181--195",

journal = "Journal of Grid Computing",

issn = "1570-7873",

publisher = "Springer Netherlands",

number = "2",

}

TY - JOUR

T1 - Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

AU - Zhong, Shan

AU - Tan, Jack

AU - Dong, Husheng

AU - Chen, Xuemei

AU - Gong, Shengrong

AU - Qian, Zhenjiang

PY - 2020/6/1

Y1 - 2020/6/1

N2 - The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

AB - The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

KW - Actor-critic

KW - Gaussian process

KW - Linear function approximation

KW - Model learning

KW - Planning

UR - http://www.scopus.com/inward/record.url?scp=85084077435&partnerID=8YFLogxK

U2 - 10.1007/s10723-020-09512-4

DO - 10.1007/s10723-020-09512-4

M3 - Article

AN - SCOPUS:85084077435

SN - 1570-7873

VL - 18

SP - 181

EP - 195

JO - Journal of Grid Computing

JF - Journal of Grid Computing

IS - 2

ER -

Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

摘要

访问文件

其它文件与链接

指纹

引用此