TY - JOUR
T1 - Advancing Autonomous BVR Air Combat
T2 - Integrated Strategy Optimization and Adaptive Learning
AU - Wang, Wenfei
AU - Ru, Le
AU - Lv, Maolong
AU - Xi, Hailong
AU - Mo, Li
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - In modern air combat, the complexity of beyond visual range (BVR) engagements stems from the dynamic nature of the game environment, numerous confrontation factors, and continually evolving strategies. These challenges underscore the limitations of traditional algorithms in addressing BVR decision-making problems. To overcome these issues, this paper proposes a decision-making method that combines strategy reuse with demonstration guidance, aiming to enhance the learning efficiency and adaptability of agents in complex environments. Additionally, a role-based agent framework and a collaborative learning mechanism are introduced to diversify strategies and promote group cooperation. By incorporating a strategy training approach driven by strategy entropy, the proposed method further improves the adaptability and robustness of agents in complex BVR air combat scenarios. Simulation results validate the effectiveness and superiority of the approach in BVR air combat scenarios, highlighting its efficiency and stability in the decision-making process. Note to Practitioners - In modern beyond visual range air combat, pilots and UAV systems are faced with a highly complex and dynamic confrontation environment. Traditional decision-making methods (such as rule system or traditional reinforcement learning) are often difficult to make flexible, diverse and adaptive maneuvering decisions in a short time, resulting in tactical rigidity, low learning efficiency and lack of strategic diversity in confrontation, which affects the overall operational effectiveness. Therefore, this paper puts forward an intelligent air combat decision-making method based on improved reinforcement learning, which combines strategy reuse and demonstration guidance mechanism to significantly improve the learning efficiency and decision-making adaptability in the initial stage of training. Enhance tactical diversity and confrontation ability through role-based agent design and collaborative learning mechanism. The training method driven by strategic entropy is adopted to ensure that the system can still maintain robustness in the face of uncertain confrontation. The decision-making method in this paper can be embedded in the intelligent decision-making module of UAV or manned/unmanned cooperative air combat system, which supports real-time tactical generation and dynamic adjustment. At the same time, it can also be embedded into the confrontation training simulator as an AI opponent to support professionals such as pilots to conduct confrontation training with them. The strategy reuse and demonstration guidance mechanism in this decision-making method is suitable for scenes with high training cost and low initial exploration efficiency, such as robot control and industrial control, while the role-based multi-agent collaborative learning method can be used for reference in multi-agent systems with clear division of tasks and cooperative completion, such as logistics robot scheduling. In the simulation experiment of this paper, the simulation results show that the decision-making method is superior to the traditional reinforcement learning method in decision-making speed, cost-effectiveness ratio and winning rate, and has strong engineering landing potential. At present, the decision-making method still depends on the high-fidelity simulation environment, which is sensitive to the model accuracy, and its generalization ability in extreme confrontation scenarios needs further verification. In the future, we can introduce online transfer learning mechanism and combination of virtual and real training methods to further improve the practicability and transfer ability of decision-making methods.
AB - In modern air combat, the complexity of beyond visual range (BVR) engagements stems from the dynamic nature of the game environment, numerous confrontation factors, and continually evolving strategies. These challenges underscore the limitations of traditional algorithms in addressing BVR decision-making problems. To overcome these issues, this paper proposes a decision-making method that combines strategy reuse with demonstration guidance, aiming to enhance the learning efficiency and adaptability of agents in complex environments. Additionally, a role-based agent framework and a collaborative learning mechanism are introduced to diversify strategies and promote group cooperation. By incorporating a strategy training approach driven by strategy entropy, the proposed method further improves the adaptability and robustness of agents in complex BVR air combat scenarios. Simulation results validate the effectiveness and superiority of the approach in BVR air combat scenarios, highlighting its efficiency and stability in the decision-making process. Note to Practitioners - In modern beyond visual range air combat, pilots and UAV systems are faced with a highly complex and dynamic confrontation environment. Traditional decision-making methods (such as rule system or traditional reinforcement learning) are often difficult to make flexible, diverse and adaptive maneuvering decisions in a short time, resulting in tactical rigidity, low learning efficiency and lack of strategic diversity in confrontation, which affects the overall operational effectiveness. Therefore, this paper puts forward an intelligent air combat decision-making method based on improved reinforcement learning, which combines strategy reuse and demonstration guidance mechanism to significantly improve the learning efficiency and decision-making adaptability in the initial stage of training. Enhance tactical diversity and confrontation ability through role-based agent design and collaborative learning mechanism. The training method driven by strategic entropy is adopted to ensure that the system can still maintain robustness in the face of uncertain confrontation. The decision-making method in this paper can be embedded in the intelligent decision-making module of UAV or manned/unmanned cooperative air combat system, which supports real-time tactical generation and dynamic adjustment. At the same time, it can also be embedded into the confrontation training simulator as an AI opponent to support professionals such as pilots to conduct confrontation training with them. The strategy reuse and demonstration guidance mechanism in this decision-making method is suitable for scenes with high training cost and low initial exploration efficiency, such as robot control and industrial control, while the role-based multi-agent collaborative learning method can be used for reference in multi-agent systems with clear division of tasks and cooperative completion, such as logistics robot scheduling. In the simulation experiment of this paper, the simulation results show that the decision-making method is superior to the traditional reinforcement learning method in decision-making speed, cost-effectiveness ratio and winning rate, and has strong engineering landing potential. At present, the decision-making method still depends on the high-fidelity simulation environment, which is sensitive to the model accuracy, and its generalization ability in extreme confrontation scenarios needs further verification. In the future, we can introduce online transfer learning mechanism and combination of virtual and real training methods to further improve the practicability and transfer ability of decision-making methods.
KW - Beyond visual range (BVR)
KW - air game
KW - maneuver decision-making
KW - reinforcement learning (RL)
KW - strategy diversity
UR - https://www.scopus.com/pages/publications/105039312808
U2 - 10.1109/TASE.2026.3691170
DO - 10.1109/TASE.2026.3691170
M3 - Article
AN - SCOPUS:105039312808
SN - 1545-5955
VL - 23
SP - 9417
EP - 9435
JO - IEEE Transactions on Automation Science and Engineering
JF - IEEE Transactions on Automation Science and Engineering
ER -