TY - JOUR
T1 - Approximate Optimal Stabilization Control of Servo Mechanisms based on Reinforcement Learning Scheme
AU - Lv, Yongfeng
AU - Ren, Xuemei
AU - Hu, Shuangyi
AU - Xu, Hao
N1 - Publisher Copyright:
© 2019, ICROS, KIEE and Springer.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - A reinforcement learning (RL) based adaptive dynamic programming (ADP) is developed to learn the approximate optimal stabilization input of the servo mechanisms, where the unknown system dynamics are approximated with a three-layer neural network (NN) identifier. First, the servo mechanism model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic three-layer NN is then used to learn the optimal cost function, where NN weights of the first layer are set as constants, NN weights of the second layer are updated by minimizing the squared Hamilton-Jacobi-Bellman (HJB) error. The optimal stabilization input of the servomechanism is obtained based on the three-layer NN identifier and RL-based critic NN scheme, which can stabilize the motor speed from its initial value to the given value. Moreover, the convergence analysis of the identifier and RL-based critic NN is proved, the stability of the cost function with the proposed optimal input is analyzed. Finally, a servo mechanism model and a complex system are provided to verify the correctness of the proposed methods.
AB - A reinforcement learning (RL) based adaptive dynamic programming (ADP) is developed to learn the approximate optimal stabilization input of the servo mechanisms, where the unknown system dynamics are approximated with a three-layer neural network (NN) identifier. First, the servo mechanism model is constructed and a three-layer NN identifier is used to approximate the unknown servo system. The NN weights of both the hidden layer and output layer are synchronously tuned with an adaptive gradient law. An RL-based critic three-layer NN is then used to learn the optimal cost function, where NN weights of the first layer are set as constants, NN weights of the second layer are updated by minimizing the squared Hamilton-Jacobi-Bellman (HJB) error. The optimal stabilization input of the servomechanism is obtained based on the three-layer NN identifier and RL-based critic NN scheme, which can stabilize the motor speed from its initial value to the given value. Moreover, the convergence analysis of the identifier and RL-based critic NN is proved, the stability of the cost function with the proposed optimal input is analyzed. Finally, a servo mechanism model and a complex system are provided to verify the correctness of the proposed methods.
KW - Adaptive dynamic programming
KW - neural networks
KW - optimal control
KW - reinforcement learning
KW - servomechanisms
UR - http://www.scopus.com/inward/record.url?scp=85068879434&partnerID=8YFLogxK
U2 - 10.1007/s12555-018-0551-6
DO - 10.1007/s12555-018-0551-6
M3 - Article
AN - SCOPUS:85068879434
SN - 1598-6446
VL - 17
SP - 2655
EP - 2665
JO - International Journal of Control, Automation and Systems
JF - International Journal of Control, Automation and Systems
IS - 10
ER -