Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Ma Rubing, Wang Bo, Jia Jingyuan, Li Changchun, Dong Hao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.

Original languageEnglish
Title of host publication2023 42nd Chinese Control Conference, CCC 2023
PublisherIEEE Computer Society
Pages8306-8311
Number of pages6
ISBN (Electronic)9789887581543
DOIs
Publication statusPublished - 2023
Event42nd Chinese Control Conference, CCC 2023 - Tianjin, China
Duration: 24 Jul 202326 Jul 2023

Publication series

NameChinese Control Conference, CCC
Volume2023-July
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference42nd Chinese Control Conference, CCC 2023
Country/TerritoryChina
CityTianjin
Period24/07/2326/07/23

Keywords

  • Encirclement Rewards
  • Multi-agent Decision-making
  • Reinforcement Learning
  • Reward Shaping

Fingerprint

Dive into the research topics of 'Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards'. Together they form a unique fingerprint.

Cite this