Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

Can Chen; Tao Song; Li Mo; Maolong Lv; Defu Lin

doi:10.3390/aerospace12030265

Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

Can Chen, Tao Song, Li Mo^*, Maolong Lv, Defu Lin

^*Corresponding author for this work

School of Aerospace Engineering

Research output: Contribution to journal › Article › peer-review

Abstract

The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

Original language	English
Article number	265
Journal	Aerospace
Volume	12
Issue number	3
DOIs	https://doi.org/10.3390/aerospace12030265
Publication status	Published - Mar 2025

Keywords

air combat
automatic opponent sampling
autonomous decision-making
dogfight
proximal policy optimization
reinforcement learning

Access to Document

10.3390/aerospace12030265

Cite this

Chen, C., Song, T., Mo, L., Lv, M., & Lin, D. (2025). Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling. Aerospace, 12(3), Article 265. https://doi.org/10.3390/aerospace12030265

@article{33815bc8f8b74e098739a11896949982,

title = "Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling",

abstract = "The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.",

keywords = "air combat, automatic opponent sampling, autonomous decision-making, dogfight, proximal policy optimization, reinforcement learning",

author = "Can Chen and Tao Song and Li Mo and Maolong Lv and Defu Lin",

note = "Publisher Copyright: {\textcopyright} 2025 by the authors.",

year = "2025",

month = mar,

doi = "10.3390/aerospace12030265",

language = "English",

volume = "12",

journal = "Aerospace",

issn = "2226-4310",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "3",

}

TY - JOUR

T1 - Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

AU - Chen, Can

AU - Song, Tao

AU - Mo, Li

AU - Lv, Maolong

AU - Lin, Defu

PY - 2025/3

Y1 - 2025/3

N2 - The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

AB - The field of autonomous air combat has witnessed a surge in interest propelled by the rapid progress of artificial intelligence technology. A persistent challenge within this domain pertains to autonomous decision-making for dogfighting, especially when dealing with intricate, high-fidelity nonlinear aircraft dynamic models and insufficient information. In response to this challenge, this paper introduces reinforcement learning (RL) to train maneuvering strategies. In the context of RL for dogfighting, the method by which opponents are sampled assumes significance in determining the efficacy of training. Consequently, this paper proposes a novel automatic opponent sampling (AOS)-based RL framework where proximal policy optimization (PPO) is applied. This approach encompasses three pivotal components: a phased opponent policy pool with simulated annealing (SA)-inspired curriculum learning, an SA-inspired Boltzmann Meta-Solver, and a Gate Function based on the sliding window. The training outcomes demonstrate that this improved PPO algorithm with an AOS framework outperforms existing reinforcement learning methods such as the soft actor–critic (SAC) algorithm and the PPO algorithm with prioritized fictitious self-play (PFSP). Moreover, during testing scenarios, the trained maneuvering policy displays remarkable adaptability when confronted with a diverse array of opponents. This research signifies a substantial stride towards the realization of robust autonomous maneuvering decision systems in the context of modern air combat.

KW - air combat

KW - automatic opponent sampling

KW - autonomous decision-making

KW - dogfight

KW - proximal policy optimization

KW - reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=105001130613&partnerID=8YFLogxK

U2 - 10.3390/aerospace12030265

DO - 10.3390/aerospace12030265

M3 - Article

AN - SCOPUS:105001130613

SN - 2226-4310

VL - 12

JO - Aerospace

JF - Aerospace

IS - 3

M1 - 265

ER -

Autonomous Dogfight Decision-Making for Air Combat Based on Reinforcement Learning with Automatic Opponent Sampling

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this