TY - GEN
T1 - A-MAPPO
T2 - 2nd Aerospace Frontiers Conference, AFC 2025
AU - Feng, Zhaohan
AU - Sun, Jian
AU - Wang, Gang
N1 - Publisher Copyright:
© Press of Acta Aeronautica et Astronautica Sinica 2026.
PY - 2026
Y1 - 2026
N2 - In the domain of multi-agent reinforcement learning, the scalability of multi-agent systems presents challenges for conventional policy-based methods. As the scale increases, these methods struggle due to the growing state space and partially observable Markov decision process, which are further exacerbated by the interference between observations. This paper introduces a novel framework for enhancing multi-agent proximal policy optimization with a hard attention network. All of the features in the observation vector of one particular agent can be re-sorted according to their calculated attention values, and only those are relatively important are preserved and aggregated for decision making. Within the resorting and pruning manipulations based on hard attention, the input space of actor network is efficiently reduced, leading to faster and more stable learning for policy and critics. Our framework outperforms the vanilla multi-agent proximal policy optimization algorithm on cluster confrontation tasks of various scales and ensures training success even under extreme observation interference.
AB - In the domain of multi-agent reinforcement learning, the scalability of multi-agent systems presents challenges for conventional policy-based methods. As the scale increases, these methods struggle due to the growing state space and partially observable Markov decision process, which are further exacerbated by the interference between observations. This paper introduces a novel framework for enhancing multi-agent proximal policy optimization with a hard attention network. All of the features in the observation vector of one particular agent can be re-sorted according to their calculated attention values, and only those are relatively important are preserved and aggregated for decision making. Within the resorting and pruning manipulations based on hard attention, the input space of actor network is efficiently reduced, leading to faster and more stable learning for policy and critics. Our framework outperforms the vanilla multi-agent proximal policy optimization algorithm on cluster confrontation tasks of various scales and ensures training success even under extreme observation interference.
KW - Deep reinforcement learning
KW - hard attention mechanism
KW - multi-agent coordination
KW - multi-agent reinforcement learning
KW - multi-agent systems
UR - https://www.scopus.com/pages/publications/105023091740
U2 - 10.1007/978-981-95-2998-8_20
DO - 10.1007/978-981-95-2998-8_20
M3 - Conference contribution
AN - SCOPUS:105023091740
SN - 9789819529971
T3 - Lecture Notes in Mechanical Engineering
SP - 281
EP - 292
BT - Proceedings of the 2nd Aerospace Frontiers Conference, AFC 2025 - Volume V
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 11 April 2025 through 14 April 2025
ER -