Abstract
Highlights: This section summarizes the essential contributions and significance of this study, improving discoverability and readability for UAV intelligence researchers. Highlights provide concise insights into the technical advances and broader implications of this work. What are the main findings? A low-complexity multi-agent soft actor–critic (MASAC) framework is developed for heterogeneous UAV swarms under the centralized training–decentralized execution (CTDE) paradigm. The proposed method integrates parameter sharing, device identity embeddings, and a shared-backbone twin critic to eliminate linear parameter growth while maintaining policy diversity and convergence stability. What are the implications of the main findings? The optimized framework achieves over 14× parameter compression and an approximately 93% reduction in training time, without degrading optimization performance in large-scale UAV clusters. This work enables scalable, real-time deployment of multi-agent reinforcement learning in large-model-driven UAV systems for communication, sensing, and cooperative resource scheduling. Heterogeneous unmanned aerial vehicle (UAV) swarms are becoming critical components of next-generation non-terrestrial networks, enabling tasks such as communication relay, spectrum monitoring, cooperative sensing, and navigation. Yet, their heterogeneity and multifunctionality bring severe challenges in task allocation and resource scheduling, where traditional multi-agent reinforcement learning methods often suffer from high algorithmic complexity, lengthy training times, and deployment difficulties on resource-constrained nodes. To address these issues, this paper proposes a low-complexity multi-agent soft actor–critic (MASAC) framework that combines parameter sharing (shared actor with device embeddings and shared-backbone twin critics), lightweight network design (fixed-width residual MLP with normalization), and robust training mechanisms (minimum-bias twin-critic updates and entropy scheduling) within the CTDE paradigm. Simulation results show that the proposed framework achieves more than 14-fold parameter compression and over a 93% reduction in training time, while maintaining or improving performance in terms of the delay–energy utility function. These advances substantially reduce computational overhead and accelerate convergence, providing a practical pathway for deploying multi-agent reinforcement learning in large-scale heterogeneous UAV clusters and supporting diverse mission scenarios under stringent resource and latency constraints.
| Original language | English |
|---|---|
| Article number | 788 |
| Journal | Drones |
| Volume | 9 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published - Nov 2025 |
| Externally published | Yes |
Keywords
- centralized training–decentralized execution (CTDE)
- low complexity
- multi-agent soft actor–critic (MASAC)
- resource allocation
- unmanned aerial vehicle (UAV) cluster