TY - JOUR
T1 - 一种结合 MADDPG 和对比学习的无人机追逃博弈方法
AU - Wang, Ruobing
AU - Wang, Xiaofang
N1 - Publisher Copyright:
© 2024 Chinese Society of Astronautics. All rights reserved.
PY - 2024/2
Y1 - 2024/2
N2 - To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments, a Markov model is established, and reward functions for both pursuer and evader are designed under the zero-sum game concept. A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient (MADDPG) to solve the Nash equilibrium of the pursuit-evasion game. To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders, a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture (escape) regions through the construction and training of Siamese Network. Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm, and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95% by the combination of contrastive learning algorithm and the converged MADDPG network.
AB - To solve the pursuit and evasion game problem of unmanned aerial vehicles in complex combat environments, a Markov model is established, and reward functions for both pursuer and evader are designed under the zero-sum game concept. A centralized training with distributed execution framework is constructed for multi-agent deep deterministic policy gradient (MADDPG) to solve the Nash equilibrium of the pursuit-evasion game. To address the difficult issue of analytically representing the high-dimensional capture(escape)regions characterized by initial positions of the pursuers and evaders, a deep contrastive learning algorithm based on the MADDPG network is built to indirectly represent the high-dimensional capture (escape) regions through the construction and training of Siamese Network. Simulation results show that the Nash equilibrium solution of the pursuit-evasion game of UAVs under given conditions can be gotten by the MADDPG algorithm, and the accuracy rate of representing high-dimensional capture(escape)regions achieves 95% by the combination of contrastive learning algorithm and the converged MADDPG network.
KW - Deep contrastive learning
KW - Multi-agent
KW - Nash equilibrium
KW - Pursuit-evasion game
KW - Reinforcement learning
KW - Unmanned aerial vehicle(UAV)
UR - http://www.scopus.com/inward/record.url?scp=85191591917&partnerID=8YFLogxK
U2 - 10.3873/j.issn.1000-1328.2024.02.011
DO - 10.3873/j.issn.1000-1328.2024.02.011
M3 - 文章
AN - SCOPUS:85191591917
SN - 1000-1328
VL - 45
SP - 262
EP - 272
JO - Yuhang Xuebao/Journal of Astronautics
JF - Yuhang Xuebao/Journal of Astronautics
IS - 2
ER -