Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Ma Rubing, Wang Bo, Jia Jingyuan, Li Changchun, Dong Hao

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.

源语言英语
主期刊名2023 42nd Chinese Control Conference, CCC 2023
出版商IEEE Computer Society
8306-8311
页数6
ISBN(电子版)9789887581543
DOI
出版状态已出版 - 2023
活动42nd Chinese Control Conference, CCC 2023 - Tianjin, 中国
期限: 24 7月 202326 7月 2023

出版系列

姓名Chinese Control Conference, CCC
2023-July
ISSN(印刷版)1934-1768
ISSN(电子版)2161-2927

会议

会议42nd Chinese Control Conference, CCC 2023
国家/地区中国
Tianjin
时期24/07/2326/07/23

指纹

探究 'Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards' 的科研主题。它们共同构成独一无二的指纹。

引用此