TY - JOUR
T1 - Data-Efficient Off-Policy Learning for Distributed Optimal Tracking Control of HMAS With Unidentified Exosystem Dynamics
AU - Xu, Yong
AU - Wu, Zheng Guang
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - In this article, a data-efficient off-policy reinforcement learning (RL) approach is proposed for distributed output tracking control of heterogeneous multiagent systems (HMASs) using approximate dynamic programming (ADP). Different from existing results that the kinematic model of the exosystem is addressable to partial or all agents, the dynamics of the exosystem are assumed to be completely unknown for all agents in this article. To solve this difficulty, an identifiable algorithm using the experience-replay method is designed for each agent to identify the system matrices of the novel reference model instead of the original exosystem. Then, an output-based distributed adaptive output observer is proposed to provide the estimations of the leader, and the proposed observer not only has a low dimension and less data transmission among agents but also is implemented in a fully distributed way. Besides, a data-efficient RL algorithm is given to design the optimal controller offline along with the system trajectories without solving output regulator equations. An ADP approach is developed to iteratively solve game algebraic Riccati equations (GAREs) using online information of state and input in an online way, which relaxes the requirement of knowing prior knowledge of agents' system matrices in an offline way. Finally, a numerical example is provided to verify the effectiveness of theoretical analysis.
AB - In this article, a data-efficient off-policy reinforcement learning (RL) approach is proposed for distributed output tracking control of heterogeneous multiagent systems (HMASs) using approximate dynamic programming (ADP). Different from existing results that the kinematic model of the exosystem is addressable to partial or all agents, the dynamics of the exosystem are assumed to be completely unknown for all agents in this article. To solve this difficulty, an identifiable algorithm using the experience-replay method is designed for each agent to identify the system matrices of the novel reference model instead of the original exosystem. Then, an output-based distributed adaptive output observer is proposed to provide the estimations of the leader, and the proposed observer not only has a low dimension and less data transmission among agents but also is implemented in a fully distributed way. Besides, a data-efficient RL algorithm is given to design the optimal controller offline along with the system trajectories without solving output regulator equations. An ADP approach is developed to iteratively solve game algebraic Riccati equations (GAREs) using online information of state and input in an online way, which relaxes the requirement of knowing prior knowledge of agents' system matrices in an offline way. Finally, a numerical example is provided to verify the effectiveness of theoretical analysis.
KW - Adaptive observer
KW - approximate dynamic programming (ADP)
KW - heterogeneous multiagent systems (HMASs)
KW - output tracking
KW - reinforcement learning (RL)
UR - http://www.scopus.com/inward/record.url?scp=85130458244&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2022.3172130
DO - 10.1109/TNNLS.2022.3172130
M3 - Article
C2 - 35594235
AN - SCOPUS:85130458244
SN - 2162-237X
VL - 35
SP - 3181
EP - 3190
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 3
ER -