TY - JOUR
T1 - Mean policy-based proximal policy optimization for maneuvering decision in multi-UAV air combat
AU - Zheng, Yifan
AU - Xin, Bin
AU - He, Bin
AU - Ding, Yulong
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.
PY - 2024/11
Y1 - 2024/11
N2 - Autonomous maneuvering decision-making is a crucial technology for Unmanned Aerial Vehicles (UAVs) to take the air domination in modern unmanned warfare. With the advantage of balancing exploration and exploitation, as well as the immediacy of end-to-end output by combining with deep neural network, multi-agent reinforcement learning (MARL) has made remarkable achievements in multi-UAV autonomous air combat maneuvering decision-making (MUAAMD). However, the implementation of effective cooperative policy learning remains a challenging issue for MARL methods with centralized training decentralized execution (CTDE) paradigm. This paper proposes a MARL-based method to improve the performance of cooperation in MUAAMD. Firstly, considering the constraints of dynamic and limited perception for UAVs in the realistic air combat scenario, the MUAAMD problem is formulated based on partially observable Markov game (POMG) model. Secondly, a novel efficient MARL algorithm named the mean policy-based proximal policy optimization (MP3O) is introduced. Specifically, a joint policy optimization mechanism is constructed by estimating the policies of neighboring agents in group as a mean-field approximation while training, which enables both centralized evaluation and improvement of cooperative policy under the CTDE paradigm. Thirdly, by combining with three improvement techniques, a cooperative decision-making framework for MUAAMD based on MP3O is proposed. Empirically, results of simulations and comparative experiments validate the effectiveness of proposed method in promoting cooperative policy learning in resolving MUAAMD problem.
AB - Autonomous maneuvering decision-making is a crucial technology for Unmanned Aerial Vehicles (UAVs) to take the air domination in modern unmanned warfare. With the advantage of balancing exploration and exploitation, as well as the immediacy of end-to-end output by combining with deep neural network, multi-agent reinforcement learning (MARL) has made remarkable achievements in multi-UAV autonomous air combat maneuvering decision-making (MUAAMD). However, the implementation of effective cooperative policy learning remains a challenging issue for MARL methods with centralized training decentralized execution (CTDE) paradigm. This paper proposes a MARL-based method to improve the performance of cooperation in MUAAMD. Firstly, considering the constraints of dynamic and limited perception for UAVs in the realistic air combat scenario, the MUAAMD problem is formulated based on partially observable Markov game (POMG) model. Secondly, a novel efficient MARL algorithm named the mean policy-based proximal policy optimization (MP3O) is introduced. Specifically, a joint policy optimization mechanism is constructed by estimating the policies of neighboring agents in group as a mean-field approximation while training, which enables both centralized evaluation and improvement of cooperative policy under the CTDE paradigm. Thirdly, by combining with three improvement techniques, a cooperative decision-making framework for MUAAMD based on MP3O is proposed. Empirically, results of simulations and comparative experiments validate the effectiveness of proposed method in promoting cooperative policy learning in resolving MUAAMD problem.
KW - Autonomous air combat
KW - Mean-field reinforcement learning
KW - Multi-agent system
KW - Proximal policy optimization
UR - http://www.scopus.com/inward/record.url?scp=85200684750&partnerID=8YFLogxK
U2 - 10.1007/s00521-024-10261-8
DO - 10.1007/s00521-024-10261-8
M3 - Article
AN - SCOPUS:85200684750
SN - 0941-0643
VL - 36
SP - 19667
EP - 19690
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 31
ER -