Learning-based model predictive control under value iteration with finite approximation errors

Min Lin; Yuanqing Xia; Zhongqi Sun; Li Dai

doi:10.1002/rnc.7117

Learning-based model predictive control under value iteration with finite approximation errors

Min Lin, Yuanqing Xia^*, Zhongqi Sun, Li Dai

^*此作品的通讯作者

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

This paper proposes a novel learning-based model predictive control (LMPC) scheme for discrete-time nonlinear systems. It overcomes the challenge of manually designing the terminal conditions for traditional MPC and enhances the control performance. The scheme employs the value iteration (VI) in reinforcement learning (RL), and autonomously designs the terminal cost by iteratively performing value function learning and policy update under known dynamics and constraints. In contrast to the existing schemes that combine RL with MPC, the proposed scheme explicitly considers the approximation errors in each iteration. Further, a rigorous theoretical analysis is provided, including the convergence of VI, the stability and performance of the closed-loop system. In addition, the influences of the prediction horizon and the initial terminal cost on performance are also investigated. Simulation results of a linear system verify the theoretical properties of the LMPC and show that it achieves (near-)optimal performance. Moreover, its unique superiority over traditional MPC is fully demonstrated in a nonholonomic vehicle regulation example.

源语言	英语
页（从-至）	2946-2971
页数	26
期刊	International Journal of Robust and Nonlinear Control
卷	34
期	4
DOI	https://doi.org/10.1002/rnc.7117
出版状态	已出版 - 10 3月 2024

访问文件

10.1002/rnc.7117

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{900b9ef5e1da4696a6451f3050383fd2,

title = "Learning-based model predictive control under value iteration with finite approximation errors",

abstract = "This paper proposes a novel learning-based model predictive control (LMPC) scheme for discrete-time nonlinear systems. It overcomes the challenge of manually designing the terminal conditions for traditional MPC and enhances the control performance. The scheme employs the value iteration (VI) in reinforcement learning (RL), and autonomously designs the terminal cost by iteratively performing value function learning and policy update under known dynamics and constraints. In contrast to the existing schemes that combine RL with MPC, the proposed scheme explicitly considers the approximation errors in each iteration. Further, a rigorous theoretical analysis is provided, including the convergence of VI, the stability and performance of the closed-loop system. In addition, the influences of the prediction horizon and the initial terminal cost on performance are also investigated. Simulation results of a linear system verify the theoretical properties of the LMPC and show that it achieves (near-)optimal performance. Moreover, its unique superiority over traditional MPC is fully demonstrated in a nonholonomic vehicle regulation example.",

keywords = "Gaussian process regression, approximation error, model predictive control, reinforcement learning, value iteration",

author = "Min Lin and Yuanqing Xia and Zhongqi Sun and Li Dai",

note = "Publisher Copyright: {\textcopyright} 2023 John Wiley & Sons Ltd.",

year = "2024",

month = mar,

day = "10",

doi = "10.1002/rnc.7117",

language = "English",

volume = "34",

pages = "2946--2971",

journal = "International Journal of Robust and Nonlinear Control",

issn = "1049-8923",

publisher = "John Wiley and Sons Ltd",

number = "4",

}

TY - JOUR

T1 - Learning-based model predictive control under value iteration with finite approximation errors

AU - Lin, Min

AU - Xia, Yuanqing

AU - Sun, Zhongqi

AU - Dai, Li

PY - 2024/3/10

Y1 - 2024/3/10

N2 - This paper proposes a novel learning-based model predictive control (LMPC) scheme for discrete-time nonlinear systems. It overcomes the challenge of manually designing the terminal conditions for traditional MPC and enhances the control performance. The scheme employs the value iteration (VI) in reinforcement learning (RL), and autonomously designs the terminal cost by iteratively performing value function learning and policy update under known dynamics and constraints. In contrast to the existing schemes that combine RL with MPC, the proposed scheme explicitly considers the approximation errors in each iteration. Further, a rigorous theoretical analysis is provided, including the convergence of VI, the stability and performance of the closed-loop system. In addition, the influences of the prediction horizon and the initial terminal cost on performance are also investigated. Simulation results of a linear system verify the theoretical properties of the LMPC and show that it achieves (near-)optimal performance. Moreover, its unique superiority over traditional MPC is fully demonstrated in a nonholonomic vehicle regulation example.

AB - This paper proposes a novel learning-based model predictive control (LMPC) scheme for discrete-time nonlinear systems. It overcomes the challenge of manually designing the terminal conditions for traditional MPC and enhances the control performance. The scheme employs the value iteration (VI) in reinforcement learning (RL), and autonomously designs the terminal cost by iteratively performing value function learning and policy update under known dynamics and constraints. In contrast to the existing schemes that combine RL with MPC, the proposed scheme explicitly considers the approximation errors in each iteration. Further, a rigorous theoretical analysis is provided, including the convergence of VI, the stability and performance of the closed-loop system. In addition, the influences of the prediction horizon and the initial terminal cost on performance are also investigated. Simulation results of a linear system verify the theoretical properties of the LMPC and show that it achieves (near-)optimal performance. Moreover, its unique superiority over traditional MPC is fully demonstrated in a nonholonomic vehicle regulation example.

KW - Gaussian process regression

KW - approximation error

KW - model predictive control

KW - reinforcement learning

KW - value iteration

UR - http://www.scopus.com/inward/record.url?scp=85178395251&partnerID=8YFLogxK

U2 - 10.1002/rnc.7117

DO - 10.1002/rnc.7117

M3 - Article

AN - SCOPUS:85178395251

SN - 1049-8923

VL - 34

SP - 2946

EP - 2971

JO - International Journal of Robust and Nonlinear Control

JF - International Journal of Robust and Nonlinear Control

IS - 4

ER -

Learning-based model predictive control under value iteration with finite approximation errors

摘要

访问文件

其它文件与链接

指纹

引用此