TY - GEN
T1 - Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards
AU - Rubing, Ma
AU - Bo, Wang
AU - Jingyuan, Jia
AU - Changchun, Li
AU - Hao, Dong
N1 - Publisher Copyright:
© 2023 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2023
Y1 - 2023
N2 - Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.
AB - Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.
KW - Encirclement Rewards
KW - Multi-agent Decision-making
KW - Reinforcement Learning
KW - Reward Shaping
UR - http://www.scopus.com/inward/record.url?scp=85175546744&partnerID=8YFLogxK
U2 - 10.23919/CCC58697.2023.10240463
DO - 10.23919/CCC58697.2023.10240463
M3 - Conference contribution
AN - SCOPUS:85175546744
T3 - Chinese Control Conference, CCC
SP - 8306
EP - 8311
BT - 2023 42nd Chinese Control Conference, CCC 2023
PB - IEEE Computer Society
T2 - 42nd Chinese Control Conference, CCC 2023
Y2 - 24 July 2023 through 26 July 2023
ER -