TY - JOUR
T1 - Model Predictive Control-Based Value Estimation for Efficient Reinforcement Learning
AU - Wu, Qizhen
AU - Liu, Kexin
AU - Chen, Lei
N1 - Publisher Copyright:
© 2001-2011 IEEE.
PY - 2024/5/1
Y1 - 2024/5/1
N2 - Reinforcement learning (RL) suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal strategy with only a few attempts for many learning methods. Hereby, we design an improved RL method based on model predictive control that models the environment through a data-driven approach. Based on the learned environment model, it performs multistep prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle-avoidance scenario for an unmanned aerial vehicle, validate the proposed approaches.
AB - Reinforcement learning (RL) suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal strategy with only a few attempts for many learning methods. Hereby, we design an improved RL method based on model predictive control that models the environment through a data-driven approach. Based on the learned environment model, it performs multistep prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle-avoidance scenario for an unmanned aerial vehicle, validate the proposed approaches.
UR - http://www.scopus.com/inward/record.url?scp=85190170790&partnerID=8YFLogxK
U2 - 10.1109/MIS.2024.3386204
DO - 10.1109/MIS.2024.3386204
M3 - Article
AN - SCOPUS:85190170790
SN - 1541-1672
VL - 39
SP - 63
EP - 72
JO - IEEE Intelligent Systems
JF - IEEE Intelligent Systems
IS - 3
ER -