TY - GEN
T1 - Mamba-Driven Strategy for Multi-Agent Games
T2 - 44th Chinese Control Conference, CCC 2025
AU - Qi, Jinming
AU - Min, Pengyuan
AU - Feng, Zhaohan
AU - Wei, Yuzhou
AU - Wang, Gang
AU - Sun, Jian
N1 - Publisher Copyright:
© 2025 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2025
Y1 - 2025
N2 - Multi-agent reinforcement learning (MARL) has shown great potential in addressing collaborative issues within multiagent systems (MAS), particularly in domains such as autonomous driving, robotic collaboration, and network systems. This study introduces an innovative MARL algorithm, MambaCoV, which focuses on tackling the challenges of complex interactions between agents and adapting to dynamic environments. MambaCoV employs a constrained communication framework that facilitates the exchange of essential information among agents, allowing the coordination of actions and the collective optimization of overall performance. Through this mechanism, agents can fairly assess each other's contributions and allocate rewards accordingly, thereby reducing the variance in policy gradient estimation and enhancing the learning efficiency of multi-agent systems. In policy learning, we introduce a network architecture centered around the Mamba model, designed to effectively utilize historical information. Experimental evaluations in cooperative navigation and Starcraft Multi-Agent Challenge (SMAC) demonstrate that MambaCoV exhibits faster adaptability and more stable performance improvements in the early stages of training compared to existing advanced MARL methods. This result confirms that effective communication and coordination between agents are key factors in improving performance. Moreover, through precise credit assignment, MambaCoV ensures that each agent's contributions are reasonably assessed, enhancing the efficiency and stability of team collaboration.
AB - Multi-agent reinforcement learning (MARL) has shown great potential in addressing collaborative issues within multiagent systems (MAS), particularly in domains such as autonomous driving, robotic collaboration, and network systems. This study introduces an innovative MARL algorithm, MambaCoV, which focuses on tackling the challenges of complex interactions between agents and adapting to dynamic environments. MambaCoV employs a constrained communication framework that facilitates the exchange of essential information among agents, allowing the coordination of actions and the collective optimization of overall performance. Through this mechanism, agents can fairly assess each other's contributions and allocate rewards accordingly, thereby reducing the variance in policy gradient estimation and enhancing the learning efficiency of multi-agent systems. In policy learning, we introduce a network architecture centered around the Mamba model, designed to effectively utilize historical information. Experimental evaluations in cooperative navigation and Starcraft Multi-Agent Challenge (SMAC) demonstrate that MambaCoV exhibits faster adaptability and more stable performance improvements in the early stages of training compared to existing advanced MARL methods. This result confirms that effective communication and coordination between agents are key factors in improving performance. Moreover, through precise credit assignment, MambaCoV ensures that each agent's contributions are reasonably assessed, enhancing the efficiency and stability of team collaboration.
KW - Communication mechanism
KW - Credit assignment
KW - Multi-agent reinforcement learning
UR - https://www.scopus.com/pages/publications/105020291169
U2 - 10.23919/CCC64809.2025.11179120
DO - 10.23919/CCC64809.2025.11179120
M3 - Conference contribution
AN - SCOPUS:105020291169
T3 - Chinese Control Conference, CCC
SP - 6137
EP - 6142
BT - Proceedings of the 44th Chinese Control Conference, CCC 2025
A2 - Sun, Jian
A2 - Yin, Hongpeng
PB - IEEE Computer Society
Y2 - 28 July 2025 through 30 July 2025
ER -