Inverse Model Predictive Control: Learning Optimal Control Cost Functions for MPC

Fawang Zhang; Jingliang Duan; Haoyuan Xu; Hao Chen; Hui Liu; Shida Nie; Shengbo Eben Li

doi:10.1109/TII.2024.3424238

Inverse Model Predictive Control: Learning Optimal Control Cost Functions for MPC

Fawang Zhang, Jingliang Duan^*, Haoyuan Xu, Hao Chen, Hui Liu, Shida Nie, Shengbo Eben Li

^*Corresponding author for this work

School of Mechanical Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

Inverse optimal control (IOC) seeks to infer a control cost function that captures the underlying goals and preferences of expert demonstrations. While significant progress has been made in finite-horizon IOC, which focuses on learning control cost functions based on rollout trajectories rather than actual trajectories, the application of IOC to receding horizon control, also known as model predictive control (MPC), has been overlooked. MPC is more prevalent in practical settings and poses additional challenges for IOC learning since it is complicated to calculate the gradient of actual trajectories with respect to cost parameters. In light of this, we propose the inverse MPC (IMPC) method to identify the optimal cost function that effectively minimizes the discrepancy between the actual trajectory and its associated demonstration. To compute the gradient of actual trajectories with respect to cost parameters, we first establish two differential Pontryagin's maximum principle (PMP) conditions by differentiating the traditional PMP conditions with respect to cost parameters and initial states, respectively. We then formulate two auxiliary optimal control problems based on the derived differentiated PMP conditions, whose solutions can be directly used to determine the gradient for updating cost parameters. We validate the efficacy of the proposed method through experiments involving five simulation tasks and two real-world mobile robot control tasks. The results consistently demonstrate that IMPC outperforms existing finite-horizon IOC methods across all experiments.

Original language	English
Pages (from-to)	13644-13655
Number of pages	12
Journal	IEEE Transactions on Industrial Informatics
Volume	20
Issue number	12
DOIs	https://doi.org/10.1109/TII.2024.3424238
Publication status	Published - 2024

Keywords

Bilevel optimization
imitation learning
inverse model predictive control (IMPC)

Access to Document

10.1109/TII.2024.3424238

Cite this

@article{8a6d989242054f61a5c2863f68ff2080,

title = "Inverse Model Predictive Control: Learning Optimal Control Cost Functions for MPC",

abstract = "Inverse optimal control (IOC) seeks to infer a control cost function that captures the underlying goals and preferences of expert demonstrations. While significant progress has been made in finite-horizon IOC, which focuses on learning control cost functions based on rollout trajectories rather than actual trajectories, the application of IOC to receding horizon control, also known as model predictive control (MPC), has been overlooked. MPC is more prevalent in practical settings and poses additional challenges for IOC learning since it is complicated to calculate the gradient of actual trajectories with respect to cost parameters. In light of this, we propose the inverse MPC (IMPC) method to identify the optimal cost function that effectively minimizes the discrepancy between the actual trajectory and its associated demonstration. To compute the gradient of actual trajectories with respect to cost parameters, we first establish two differential Pontryagin's maximum principle (PMP) conditions by differentiating the traditional PMP conditions with respect to cost parameters and initial states, respectively. We then formulate two auxiliary optimal control problems based on the derived differentiated PMP conditions, whose solutions can be directly used to determine the gradient for updating cost parameters. We validate the efficacy of the proposed method through experiments involving five simulation tasks and two real-world mobile robot control tasks. The results consistently demonstrate that IMPC outperforms existing finite-horizon IOC methods across all experiments.",

keywords = "Bilevel optimization, imitation learning, inverse model predictive control (IMPC)",

author = "Fawang Zhang and Jingliang Duan and Haoyuan Xu and Hao Chen and Hui Liu and Shida Nie and Li, {Shengbo Eben}",

note = "Publisher Copyright: {\textcopyright} 2005-2012 IEEE.",

year = "2024",

doi = "10.1109/TII.2024.3424238",

language = "English",

volume = "20",

pages = "13644--13655",

journal = "IEEE Transactions on Industrial Informatics",

issn = "1551-3203",

publisher = "IEEE Computer Society",

number = "12",

}

TY - JOUR

T1 - Inverse Model Predictive Control

T2 - Learning Optimal Control Cost Functions for MPC

AU - Zhang, Fawang

AU - Duan, Jingliang

AU - Xu, Haoyuan

AU - Chen, Hao

AU - Liu, Hui

AU - Nie, Shida

AU - Li, Shengbo Eben

PY - 2024

Y1 - 2024

N2 - Inverse optimal control (IOC) seeks to infer a control cost function that captures the underlying goals and preferences of expert demonstrations. While significant progress has been made in finite-horizon IOC, which focuses on learning control cost functions based on rollout trajectories rather than actual trajectories, the application of IOC to receding horizon control, also known as model predictive control (MPC), has been overlooked. MPC is more prevalent in practical settings and poses additional challenges for IOC learning since it is complicated to calculate the gradient of actual trajectories with respect to cost parameters. In light of this, we propose the inverse MPC (IMPC) method to identify the optimal cost function that effectively minimizes the discrepancy between the actual trajectory and its associated demonstration. To compute the gradient of actual trajectories with respect to cost parameters, we first establish two differential Pontryagin's maximum principle (PMP) conditions by differentiating the traditional PMP conditions with respect to cost parameters and initial states, respectively. We then formulate two auxiliary optimal control problems based on the derived differentiated PMP conditions, whose solutions can be directly used to determine the gradient for updating cost parameters. We validate the efficacy of the proposed method through experiments involving five simulation tasks and two real-world mobile robot control tasks. The results consistently demonstrate that IMPC outperforms existing finite-horizon IOC methods across all experiments.

AB - Inverse optimal control (IOC) seeks to infer a control cost function that captures the underlying goals and preferences of expert demonstrations. While significant progress has been made in finite-horizon IOC, which focuses on learning control cost functions based on rollout trajectories rather than actual trajectories, the application of IOC to receding horizon control, also known as model predictive control (MPC), has been overlooked. MPC is more prevalent in practical settings and poses additional challenges for IOC learning since it is complicated to calculate the gradient of actual trajectories with respect to cost parameters. In light of this, we propose the inverse MPC (IMPC) method to identify the optimal cost function that effectively minimizes the discrepancy between the actual trajectory and its associated demonstration. To compute the gradient of actual trajectories with respect to cost parameters, we first establish two differential Pontryagin's maximum principle (PMP) conditions by differentiating the traditional PMP conditions with respect to cost parameters and initial states, respectively. We then formulate two auxiliary optimal control problems based on the derived differentiated PMP conditions, whose solutions can be directly used to determine the gradient for updating cost parameters. We validate the efficacy of the proposed method through experiments involving five simulation tasks and two real-world mobile robot control tasks. The results consistently demonstrate that IMPC outperforms existing finite-horizon IOC methods across all experiments.

KW - Bilevel optimization

KW - imitation learning

KW - inverse model predictive control (IMPC)

UR - http://www.scopus.com/inward/record.url?scp=85206945495&partnerID=8YFLogxK

U2 - 10.1109/TII.2024.3424238

DO - 10.1109/TII.2024.3424238

M3 - Article

AN - SCOPUS:85206945495

SN - 1551-3203

VL - 20

SP - 13644

EP - 13655

JO - IEEE Transactions on Industrial Informatics

JF - IEEE Transactions on Industrial Informatics

IS - 12

ER -

Inverse Model Predictive Control: Learning Optimal Control Cost Functions for MPC

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this