TY - JOUR
T1 - Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator
AU - Zhong, Shan
AU - Tan, Jack
AU - Dong, Husheng
AU - Chen, Xuemei
AU - Gong, Shengrong
AU - Qian, Zhenjiang
N1 - Publisher Copyright:
© 2020, Springer Nature B.V.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.
AB - The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.
KW - Actor-critic
KW - Gaussian process
KW - Linear function approximation
KW - Model learning
KW - Planning
UR - http://www.scopus.com/inward/record.url?scp=85084077435&partnerID=8YFLogxK
U2 - 10.1007/s10723-020-09512-4
DO - 10.1007/s10723-020-09512-4
M3 - Article
AN - SCOPUS:85084077435
SN - 1570-7873
VL - 18
SP - 181
EP - 195
JO - Journal of Grid Computing
JF - Journal of Grid Computing
IS - 2
ER -