Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory

Wenjing Ni; Bo Wang; Hua Zhong; Xiang Guo

doi:10.1109/CAC57257.2022.10055953

Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory

Wenjing Ni^*, Bo Wang, Hua Zhong, Xiang Guo

^*Corresponding author for this work

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.

Original language	English
Title of host publication	Proceedings - 2022 Chinese Automation Congress, CAC 2022
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1699-1704
Number of pages	6
ISBN (Electronic)	9781665465335
DOIs	https://doi.org/10.1109/CAC57257.2022.10055953
Publication status	Published - 2022
Event	2022 Chinese Automation Congress, CAC 2022 - Xiamen, China Duration: 25 Nov 2022 → 27 Nov 2022

Publication series

Name	Proceedings - 2022 Chinese Automation Congress, CAC 2022
Volume	2022-January

Conference

Conference	2022 Chinese Automation Congress, CAC 2022
Country/Territory	China
City	Xiamen
Period	25/11/22 → 27/11/22

Keywords

continuous policy
generalizable episodic memory
multi-agent reinforcement learning
sample efficiency

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/CAC57257.2022.10055953

Cite this

Ni, W., Wang, B., Zhong, H., & Guo, X. (2022). Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory. In Proceedings - 2022 Chinese Automation Congress, CAC 2022 (pp. 1699-1704). (Proceedings - 2022 Chinese Automation Congress, CAC 2022; Vol. 2022-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CAC57257.2022.10055953

@inproceedings{320ea36642db40c48fb0be71440c980b,

title = "Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory",

abstract = "Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.",

keywords = "continuous policy, generalizable episodic memory, multi-agent reinforcement learning, sample efficiency",

author = "Wenjing Ni and Bo Wang and Hua Zhong and Xiang Guo",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 Chinese Automation Congress, CAC 2022 ; Conference date: 25-11-2022 Through 27-11-2022",

year = "2022",

doi = "10.1109/CAC57257.2022.10055953",

language = "English",

series = "Proceedings - 2022 Chinese Automation Congress, CAC 2022",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1699--1704",

booktitle = "Proceedings - 2022 Chinese Automation Congress, CAC 2022",

address = "United States",

}

Ni, W, Wang, B, Zhong, H & Guo, X 2022, Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory. in Proceedings - 2022 Chinese Automation Congress, CAC 2022. Proceedings - 2022 Chinese Automation Congress, CAC 2022, vol. 2022-January, Institute of Electrical and Electronics Engineers Inc., pp. 1699-1704, 2022 Chinese Automation Congress, CAC 2022, Xiamen, China, 25/11/22. https://doi.org/10.1109/CAC57257.2022.10055953

Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory. / Ni, Wenjing; Wang, Bo; Zhong, Hua et al.
Proceedings - 2022 Chinese Automation Congress, CAC 2022. Institute of Electrical and Electronics Engineers Inc., 2022. p. 1699-1704 (Proceedings - 2022 Chinese Automation Congress, CAC 2022; Vol. 2022-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory

AU - Ni, Wenjing

AU - Wang, Bo

AU - Zhong, Hua

AU - Guo, Xiang

PY - 2022

Y1 - 2022

N2 - Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.

AB - Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.

KW - continuous policy

KW - generalizable episodic memory

KW - multi-agent reinforcement learning

KW - sample efficiency

UR - http://www.scopus.com/inward/record.url?scp=85151149182&partnerID=8YFLogxK

U2 - 10.1109/CAC57257.2022.10055953

DO - 10.1109/CAC57257.2022.10055953

M3 - Conference contribution

AN - SCOPUS:85151149182

T3 - Proceedings - 2022 Chinese Automation Congress, CAC 2022

SP - 1699

EP - 1704

BT - Proceedings - 2022 Chinese Automation Congress, CAC 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 Chinese Automation Congress, CAC 2022

Y2 - 25 November 2022 through 27 November 2022

ER -

Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory

Abstract

Publication series

Conference

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this