TY - JOUR
T1 - Reinforcement Learning-Based Optimal Formation Tracking for UAVs With Safety Constraints
AU - Wang, Ping
AU - Yu, Chengpu
AU - Deng, Fang
AU - Chen, Jie
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2026
Y1 - 2026
N2 - This article develops a scheme to tackle the safe optimal formation tracking issue for multiple fixed-wing uncrewed aerial vehicles (UAVs) with external disturbances and asymmetric control constraints. To ensure safety constraints in collision avoidance, a safe set is first constructed by a super level set of a continuously differential function, following a novel control barrier function (CBF) to characterize the safety. Subsequently, we transform the safe optimal formation tracking control into a constrained zero-sum (ZS) differential game to mitigate the destabilizing effects of the disturbances, where the cost function is constructed in a nonquadratic form to cope with asymmetric input constraints. Particularly, the designed CBF is integrated into the cost function to penalize the unsafe behavior, and a damping coefficient is included to balance the optimality and safety. Afterwords, a critic-only reinforcement learning (RL) strategy is developed to learn the robust safe Nash policy, where the critic weights are updated by applying experience replay technology, thus avoiding the requirement for persistence of excitation condition. Moreover, the stability and forward invariance of the safe set of the presented scheme are also verified. Finally, simulation examples are provided to substantiate the validity of the control scheme.
AB - This article develops a scheme to tackle the safe optimal formation tracking issue for multiple fixed-wing uncrewed aerial vehicles (UAVs) with external disturbances and asymmetric control constraints. To ensure safety constraints in collision avoidance, a safe set is first constructed by a super level set of a continuously differential function, following a novel control barrier function (CBF) to characterize the safety. Subsequently, we transform the safe optimal formation tracking control into a constrained zero-sum (ZS) differential game to mitigate the destabilizing effects of the disturbances, where the cost function is constructed in a nonquadratic form to cope with asymmetric input constraints. Particularly, the designed CBF is integrated into the cost function to penalize the unsafe behavior, and a damping coefficient is included to balance the optimality and safety. Afterwords, a critic-only reinforcement learning (RL) strategy is developed to learn the robust safe Nash policy, where the critic weights are updated by applying experience replay technology, thus avoiding the requirement for persistence of excitation condition. Moreover, the stability and forward invariance of the safe set of the presented scheme are also verified. Finally, simulation examples are provided to substantiate the validity of the control scheme.
KW - Control barrier function (CBF)
KW - control constraints
KW - formation tracking
KW - safe reinforcement learning (RL)
KW - uncrewed aerial vehicle (UAV)
UR - https://www.scopus.com/pages/publications/105026473224
U2 - 10.1109/TNNLS.2025.3643630
DO - 10.1109/TNNLS.2025.3643630
M3 - Article
AN - SCOPUS:105026473224
SN - 2162-237X
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -