TY - GEN
T1 - OMA-QMIX
T2 - 43rd Chinese Control Conference, CCC 2024
AU - Sun, Licheng
AU - Chen, Hui
AU - Guo, Zhentao
AU - Wang, Tianhao
AU - Ding, Ao
AU - Ma, Hongbin
N1 - Publisher Copyright:
© 2024 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2024
Y1 - 2024
N2 - In the real world, many tasks involve multiple agents, such as swarm robotics, drone swarm control, and autonomous vehicle coordination, all of which can be modeled as Multi-Agent Reinforcement Learning (MARL) tasks. Several methods, such as QMIX, have been proposed to address credit assignment problems and learn cooperative strategies in MARL. The latest variant of the state-of-the-art MARL algorithm QMIX aims to relax QMIX's monotonicity constraint to improve SMAC's performance. However, these methods still lack thorough exploration, and agents struggle to identify states worthy of exploration, making it challenging to coordinate exploration efforts on these states and leading to suboptimal policies. In this paper, we propose an exploration-oriented Multi-Agent Reinforcement Learning framework called OMA-QMIX, where agents set independent entropy temperatures for each agent during exploration, selecting targets from multiple projected state spaces to explore action spaces and approximate total state values. Additionally, we utilize Transformers to capture relationships and information from other agents for exploitation, fostering coordination with the rest of the agents. Experimental results demonstrate that OMA-QMIX significantly outperforms state-of-the-art algorithms in the StarCraft Multi-Agent Challenge. Particularly on SMAC tasks, OMAQMIX achieves a success rate of 100% on almost all Hard Maps and Super Hard Maps.
AB - In the real world, many tasks involve multiple agents, such as swarm robotics, drone swarm control, and autonomous vehicle coordination, all of which can be modeled as Multi-Agent Reinforcement Learning (MARL) tasks. Several methods, such as QMIX, have been proposed to address credit assignment problems and learn cooperative strategies in MARL. The latest variant of the state-of-the-art MARL algorithm QMIX aims to relax QMIX's monotonicity constraint to improve SMAC's performance. However, these methods still lack thorough exploration, and agents struggle to identify states worthy of exploration, making it challenging to coordinate exploration efforts on these states and leading to suboptimal policies. In this paper, we propose an exploration-oriented Multi-Agent Reinforcement Learning framework called OMA-QMIX, where agents set independent entropy temperatures for each agent during exploration, selecting targets from multiple projected state spaces to explore action spaces and approximate total state values. Additionally, we utilize Transformers to capture relationships and information from other agents for exploitation, fostering coordination with the rest of the agents. Experimental results demonstrate that OMA-QMIX significantly outperforms state-of-the-art algorithms in the StarCraft Multi-Agent Challenge. Particularly on SMAC tasks, OMAQMIX achieves a success rate of 100% on almost all Hard Maps and Super Hard Maps.
KW - Deep Learning
KW - Multi-Agent
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=85205497096&partnerID=8YFLogxK
U2 - 10.23919/CCC63176.2024.10662294
DO - 10.23919/CCC63176.2024.10662294
M3 - Conference contribution
AN - SCOPUS:85205497096
T3 - Chinese Control Conference, CCC
SP - 8194
EP - 8199
BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024
A2 - Na, Jing
A2 - Sun, Jian
PB - IEEE Computer Society
Y2 - 28 July 2024 through 31 July 2024
ER -