TY - JOUR
T1 - Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling
AU - Chen, Can
AU - Song, Tao
AU - Mo, Li
AU - Lv, Maolong
AU - Lin, Defu
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/3
Y1 - 2025/3
N2 - The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.
AB - The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.
KW - air combat
KW - automatic opponent sampling
KW - autonomous decision-making
KW - dogfight
KW - proximal policy optimization
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=105001130613&partnerID=8YFLogxK
U2 - 10.3390/aerospace12030265
DO - 10.3390/aerospace12030265
M3 - Article
AN - SCOPUS:105001130613
SN - 2226-4310
VL - 12
JO - Aerospace
JF - Aerospace
IS - 3
M1 - 265
ER -