TY - JOUR
T1 - Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO
AU - Hong, Xiaotong
AU - Wang, Zhengjie
AU - Wang, Yue
AU - Xue, Chao
AU - Gao, Yang
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/11
Y1 - 2025/11
N2 - Highlights: What are the main findings? Proposes MPRS-MAPPO, an adaptive reward shaping method integrating three potential fields, enhancing multi-UAV coordination and learning efficiency in dynamic target search. Achieves a 7.87–29.76% improvement in target detection rate and an 11.58% increase in training return compared to baseline methods. What are the implications of the main findings? Offers an effective MARL framework for cooperative search under sparse rewards and dynamic conditions. The design enhances efficiency and stability, serving as a reference for other multi-agent systems. In the cooperative search for dynamic targets by multiple UAVs, target uncertainty and system complexity pose significant challenges to cooperative decision-making. Multi-agent reinforcement learning (MARL) technology can be used for cooperative policy optimization, but it suffers from convergence difficulties and low policy quality in reward-sparse environments such as dynamic target search. To address this issue, this paper proposes a Multi-Potential-Field Fusion Reward Shaping MAPPO (MPRS-MAPPO) algorithm. First, three potential field functions are constructed for reward shaping: probability edge potential field, maximum probability potential field, and coverage probability sum potential field. Subsequently, an adaptive fusion weight mechanism is proposed to adjust fusion weights based on the correlation between potential field values and advantage values. Furthermore, a warm-up phase is introduced to improve training stability. Extensive experiments, including multi-scale and physical tests, demonstrate that MPRS-MAPPO significantly improves convergence speed, detection rate, and stability compared with MAPPO, MASAC, QMIX, and Scanline. Detection rates increased by 7.87–29.76%, and training uncertainty decreased by 7.43–56.36%, validating the algorithm’s robustness, scalability, and real-world applicability.
AB - Highlights: What are the main findings? Proposes MPRS-MAPPO, an adaptive reward shaping method integrating three potential fields, enhancing multi-UAV coordination and learning efficiency in dynamic target search. Achieves a 7.87–29.76% improvement in target detection rate and an 11.58% increase in training return compared to baseline methods. What are the implications of the main findings? Offers an effective MARL framework for cooperative search under sparse rewards and dynamic conditions. The design enhances efficiency and stability, serving as a reference for other multi-agent systems. In the cooperative search for dynamic targets by multiple UAVs, target uncertainty and system complexity pose significant challenges to cooperative decision-making. Multi-agent reinforcement learning (MARL) technology can be used for cooperative policy optimization, but it suffers from convergence difficulties and low policy quality in reward-sparse environments such as dynamic target search. To address this issue, this paper proposes a Multi-Potential-Field Fusion Reward Shaping MAPPO (MPRS-MAPPO) algorithm. First, three potential field functions are constructed for reward shaping: probability edge potential field, maximum probability potential field, and coverage probability sum potential field. Subsequently, an adaptive fusion weight mechanism is proposed to adjust fusion weights based on the correlation between potential field values and advantage values. Furthermore, a warm-up phase is introduced to improve training stability. Extensive experiments, including multi-scale and physical tests, demonstrate that MPRS-MAPPO significantly improves convergence speed, detection rate, and stability compared with MAPPO, MASAC, QMIX, and Scanline. Detection rates increased by 7.87–29.76%, and training uncertainty decreased by 7.43–56.36%, validating the algorithm’s robustness, scalability, and real-world applicability.
KW - dynamic target search
KW - multi-potential field fusion
KW - multi-UAV collaboration
KW - reinforcement learning
KW - reward shaping
UR - https://www.scopus.com/pages/publications/105023092745
U2 - 10.3390/drones9110770
DO - 10.3390/drones9110770
M3 - Article
AN - SCOPUS:105023092745
SN - 2504-446X
VL - 9
JO - Drones
JF - Drones
IS - 11
M1 - 770
ER -