TY - JOUR
T1 - Dynamic and adaptive learning for autonomous decision-making in beyond visual range air combat
AU - Wang, Wenfei
AU - Ru, Le
AU - Lv, Maolong
AU - Mo, Li
N1 - Publisher Copyright:
© 2025 Elsevier Masson SAS
PY - 2025/8
Y1 - 2025/8
N2 - The environment of beyond-visual-range (BVR) air combat is complex and dynamic, making traditional decision-making methods insufficient for modern combat scenarios. This paper first analyzes the confrontation process in BVR air combat and develops a corresponding decision-making model for air combat. To address the challenge of coupling maneuver and missile launch decisions, we propose a hybrid bifurcation action space design method, allowing for more precise control and improved learning. Additionally, this paper introduces Progressive Opponent Reinforcement Learning (PORL), which incorporates progressively challenging opponents to simulate real-world adversary strategies. Based on the Soft Actor-Critic (SAC) algorithm, this method strengthens the exploration and utilization of learning balance through maximum entropy, and dynamically adjusts the opponent's tactics according to the agent's performance, thus improving the agent's learning efficiency and adaptability in the rapidly changing confrontation environment. Furthermore, a dynamic opponent sampling mechanism is designed to select adversaries with varying difficulty levels based on the agent's current performance, ensuring a balanced training process. Simulation results demonstrate that the proposed decision-making framework significantly improves the autonomous decision-making capabilities and countermeasure effectiveness of agents in BVR air combat.
AB - The environment of beyond-visual-range (BVR) air combat is complex and dynamic, making traditional decision-making methods insufficient for modern combat scenarios. This paper first analyzes the confrontation process in BVR air combat and develops a corresponding decision-making model for air combat. To address the challenge of coupling maneuver and missile launch decisions, we propose a hybrid bifurcation action space design method, allowing for more precise control and improved learning. Additionally, this paper introduces Progressive Opponent Reinforcement Learning (PORL), which incorporates progressively challenging opponents to simulate real-world adversary strategies. Based on the Soft Actor-Critic (SAC) algorithm, this method strengthens the exploration and utilization of learning balance through maximum entropy, and dynamically adjusts the opponent's tactics according to the agent's performance, thus improving the agent's learning efficiency and adaptability in the rapidly changing confrontation environment. Furthermore, a dynamic opponent sampling mechanism is designed to select adversaries with varying difficulty levels based on the agent's current performance, ensuring a balanced training process. Simulation results demonstrate that the proposed decision-making framework significantly improves the autonomous decision-making capabilities and countermeasure effectiveness of agents in BVR air combat.
KW - Beyond visual range air combat
KW - Maneuver decision-making
KW - Opponent learning
KW - Reinforcement learning (RL)
UR - http://www.scopus.com/inward/record.url?scp=105005206993&partnerID=8YFLogxK
U2 - 10.1016/j.ast.2025.110327
DO - 10.1016/j.ast.2025.110327
M3 - Article
AN - SCOPUS:105005206993
SN - 1270-9638
VL - 163
JO - Aerospace Science and Technology
JF - Aerospace Science and Technology
M1 - 110327
ER -