Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Ma Rubing; Wang Bo; Jia Jingyuan; Li Changchun; Dong Hao

doi:10.23919/CCC58697.2023.10240463

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Ma Rubing, Wang Bo, Jia Jingyuan, Li Changchun, Dong Hao

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.

Original language	English
Title of host publication	2023 42nd Chinese Control Conference, CCC 2023
Publisher	IEEE Computer Society
Pages	8306-8311
Number of pages	6
ISBN (Electronic)	9789887581543
DOIs	https://doi.org/10.23919/CCC58697.2023.10240463
Publication status	Published - 2023
Event	42nd Chinese Control Conference, CCC 2023 - Tianjin, China Duration: 24 Jul 2023 → 26 Jul 2023

Publication series

Name	Chinese Control Conference, CCC
Volume	2023-July
ISSN (Print)	1934-1768
ISSN (Electronic)	2161-2927

Conference

Conference	42nd Chinese Control Conference, CCC 2023
Country/Territory	China
City	Tianjin
Period	24/07/23 → 26/07/23

Keywords

Encirclement Rewards
Multi-agent Decision-making
Reinforcement Learning
Reward Shaping

Access to Document

10.23919/CCC58697.2023.10240463

Cite this

@inproceedings{0e99771e6bf344999e539034593ec6e3,

title = "Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards",

abstract = "Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.",

keywords = "Encirclement Rewards, Multi-agent Decision-making, Reinforcement Learning, Reward Shaping",

author = "Ma Rubing and Wang Bo and Jia Jingyuan and Li Changchun and Dong Hao",

note = "Publisher Copyright: {\textcopyright} 2023 Technical Committee on Control Theory, Chinese Association of Automation.; 42nd Chinese Control Conference, CCC 2023 ; Conference date: 24-07-2023 Through 26-07-2023",

year = "2023",

doi = "10.23919/CCC58697.2023.10240463",

language = "English",

series = "Chinese Control Conference, CCC",

publisher = "IEEE Computer Society",

pages = "8306--8311",

booktitle = "2023 42nd Chinese Control Conference, CCC 2023",

address = "United States",

}

Rubing, M, Bo, W, Jingyuan, J, Changchun, L & Hao, D 2023, Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards. in 2023 42nd Chinese Control Conference, CCC 2023. Chinese Control Conference, CCC, vol. 2023-July, IEEE Computer Society, pp. 8306-8311, 42nd Chinese Control Conference, CCC 2023, Tianjin, China, 24/07/23. https://doi.org/10.23919/CCC58697.2023.10240463

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards. / Rubing, Ma; Bo, Wang; Jingyuan, Jia et al.
2023 42nd Chinese Control Conference, CCC 2023. IEEE Computer Society, 2023. p. 8306-8311 (Chinese Control Conference, CCC; Vol. 2023-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

AU - Rubing, Ma

AU - Bo, Wang

AU - Jingyuan, Jia

AU - Changchun, Li

AU - Hao, Dong

PY - 2023

Y1 - 2023

N2 - Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.

AB - Multi-agent decision-making is increasingly applied on many situations especially on military applications, but their autonomous decision-making ability needs to be improved. Multi-agent Deep Deterministic Policy Gradient (MADDPG) adopts the method of centralized evaluation and decentralized execution, while updates each agent's network parameters based on the global state information rather than only its own state, which can make the entire agent's policy network update in the direction of the global optimum, rather than the individual optimum. In the process of multi-agent cooperation decision-making, the encirclement rewards is introduced to guide agents to make cooperative actions and alleviate the problem of sparse rewards. Firstly, we define the encirclement. By using Graham's algorithm, we find out the effective encirclement of N agents. We evaluate the encirclement quality from the area of the encirclement and the difficulty of breaking-through for the adversary, and then design the rewards function based on this. Simulation experiments show that the convergence speed and win rate of MADDPG algorithm based on encirclement rewards is significantly improved, and it also has strong adaptability to various task scenarios.

KW - Encirclement Rewards

KW - Multi-agent Decision-making

KW - Reinforcement Learning

KW - Reward Shaping

UR - http://www.scopus.com/inward/record.url?scp=85175546744&partnerID=8YFLogxK

U2 - 10.23919/CCC58697.2023.10240463

DO - 10.23919/CCC58697.2023.10240463

M3 - Conference contribution

AN - SCOPUS:85175546744

T3 - Chinese Control Conference, CCC

SP - 8306

EP - 8311

BT - 2023 42nd Chinese Control Conference, CCC 2023

PB - IEEE Computer Society

T2 - 42nd Chinese Control Conference, CCC 2023

Y2 - 24 July 2023 through 26 July 2023

ER -

Multi-Agent Cooperation Decision-Making by Reinforcement Learning with Encirclement Rewards

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this