TY - JOUR
T1 - Inverse Model Predictive Control
T2 - Learning Optimal Control Cost Functions for MPC
AU - Zhang, Fawang
AU - Duan, Jingliang
AU - Xu, Haoyuan
AU - Chen, Hao
AU - Liu, Hui
AU - Nie, Shida
AU - Li, Shengbo Eben
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Inverse optimal control (IOC) seeks to infer a control cost function that captures the underlying goals and preferences of expert demonstrations. While significant progress has been made in finite-horizon IOC, which focuses on learning control cost functions based on rollout trajectories rather than actual trajectories, the application of IOC to receding horizon control, also known as model predictive control (MPC), has been overlooked. MPC is more prevalent in practical settings and poses additional challenges for IOC learning since it is complicated to calculate the gradient of actual trajectories with respect to cost parameters. In light of this, we propose the inverse MPC (IMPC) method to identify the optimal cost function that effectively minimizes the discrepancy between the actual trajectory and its associated demonstration. To compute the gradient of actual trajectories with respect to cost parameters, we first establish two differential Pontryagin's maximum principle (PMP) conditions by differentiating the traditional PMP conditions with respect to cost parameters and initial states, respectively. We then formulate two auxiliary optimal control problems based on the derived differentiated PMP conditions, whose solutions can be directly used to determine the gradient for updating cost parameters. We validate the efficacy of the proposed method through experiments involving five simulation tasks and two real-world mobile robot control tasks. The results consistently demonstrate that IMPC outperforms existing finite-horizon IOC methods across all experiments.
AB - Inverse optimal control (IOC) seeks to infer a control cost function that captures the underlying goals and preferences of expert demonstrations. While significant progress has been made in finite-horizon IOC, which focuses on learning control cost functions based on rollout trajectories rather than actual trajectories, the application of IOC to receding horizon control, also known as model predictive control (MPC), has been overlooked. MPC is more prevalent in practical settings and poses additional challenges for IOC learning since it is complicated to calculate the gradient of actual trajectories with respect to cost parameters. In light of this, we propose the inverse MPC (IMPC) method to identify the optimal cost function that effectively minimizes the discrepancy between the actual trajectory and its associated demonstration. To compute the gradient of actual trajectories with respect to cost parameters, we first establish two differential Pontryagin's maximum principle (PMP) conditions by differentiating the traditional PMP conditions with respect to cost parameters and initial states, respectively. We then formulate two auxiliary optimal control problems based on the derived differentiated PMP conditions, whose solutions can be directly used to determine the gradient for updating cost parameters. We validate the efficacy of the proposed method through experiments involving five simulation tasks and two real-world mobile robot control tasks. The results consistently demonstrate that IMPC outperforms existing finite-horizon IOC methods across all experiments.
KW - Bilevel optimization
KW - imitation learning
KW - inverse model predictive control (IMPC)
UR - http://www.scopus.com/inward/record.url?scp=85206945495&partnerID=8YFLogxK
U2 - 10.1109/TII.2024.3424238
DO - 10.1109/TII.2024.3424238
M3 - Article
AN - SCOPUS:85206945495
SN - 1551-3203
VL - 20
SP - 13644
EP - 13655
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 12
ER -