Abstract
This paper investigates the optimal control problem for an unknown linear time-invariant (LTI) system. To solve this problem, a novel composite policy iteration (CPI) algorithm based on adaptive dynamic programming is developed to adaptively learn the optimal control policy from system data. The existing methods require the initial stabilizing control policy, the persistence of excitation (PE) condition and the data storage to ensure the algorithm convergence. Fundamentally different from them, these restrictions can be relaxed in the proposed method. Specifically, an adaptive parameter is elaborately designed to remove the requirement of the initial stabilizing control policy. Besides, an online data calculation scheme is proposed, which can not only replace the stored historical data by online data, but also can relax the PE condition to the interval excitation (IE) condition. The simulation results demonstrate the efficacy of the proposed algorithm, and its superiority is also demonstrated by comparing it with existing algorithms.
Original language | English |
---|---|
Journal | IEEE Transactions on Automatic Control |
DOIs | |
Publication status | Accepted/In press - 2025 |
Keywords
- Adaptive dynamic programming
- optimal control
- policy iteration