Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO

Research output: Contribution to journalArticlepeer-review

Abstract

Highlights: What are the main findings? Proposes MPRS-MAPPO, an adaptive reward shaping method integrating three potential fields, enhancing multi-UAV coordination and learning efficiency in dynamic target search. Achieves a 7.87–29.76% improvement in target detection rate and an 11.58% increase in training return compared to baseline methods. What are the implications of the main findings? Offers an effective MARL framework for cooperative search under sparse rewards and dynamic conditions. The design enhances efficiency and stability, serving as a reference for other multi-agent systems. In the cooperative search for dynamic targets by multiple UAVs, target uncertainty and system complexity pose significant challenges to cooperative decision-making. Multi-agent reinforcement learning (MARL) technology can be used for cooperative policy optimization, but it suffers from convergence difficulties and low policy quality in reward-sparse environments such as dynamic target search. To address this issue, this paper proposes a Multi-Potential-Field Fusion Reward Shaping MAPPO (MPRS-MAPPO) algorithm. First, three potential field functions are constructed for reward shaping: probability edge potential field, maximum probability potential field, and coverage probability sum potential field. Subsequently, an adaptive fusion weight mechanism is proposed to adjust fusion weights based on the correlation between potential field values and advantage values. Furthermore, a warm-up phase is introduced to improve training stability. Extensive experiments, including multi-scale and physical tests, demonstrate that MPRS-MAPPO significantly improves convergence speed, detection rate, and stability compared with MAPPO, MASAC, QMIX, and Scanline. Detection rates increased by 7.87–29.76%, and training uncertainty decreased by 7.43–56.36%, validating the algorithm’s robustness, scalability, and real-world applicability.

Original languageEnglish
Article number770
JournalDrones
Volume9
Issue number11
DOIs
Publication statusPublished - Nov 2025
Externally publishedYes

Keywords

  • dynamic target search
  • multi-potential field fusion
  • multi-UAV collaboration
  • reinforcement learning
  • reward shaping

Fingerprint

Dive into the research topics of 'Multi-UAV Dynamic Target Search Based on Multi-Potential-Field Fusion Reward Shaping MAPPO'. Together they form a unique fingerprint.

Cite this