Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory

Wenjing Ni*, Bo Wang, Hua Zhong, Xiang Guo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.

Original languageEnglish
Title of host publicationProceedings - 2022 Chinese Automation Congress, CAC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1699-1704
Number of pages6
ISBN (Electronic)9781665465335
DOIs
Publication statusPublished - 2022
Event2022 Chinese Automation Congress, CAC 2022 - Xiamen, China
Duration: 25 Nov 202227 Nov 2022

Publication series

NameProceedings - 2022 Chinese Automation Congress, CAC 2022
Volume2022-January

Conference

Conference2022 Chinese Automation Congress, CAC 2022
Country/TerritoryChina
CityXiamen
Period25/11/2227/11/22

Keywords

  • continuous policy
  • generalizable episodic memory
  • multi-agent reinforcement learning
  • sample efficiency

Fingerprint

Dive into the research topics of 'Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory'. Together they form a unique fingerprint.

Cite this

Ni, W., Wang, B., Zhong, H., & Guo, X. (2022). Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory. In Proceedings - 2022 Chinese Automation Congress, CAC 2022 (pp. 1699-1704). (Proceedings - 2022 Chinese Automation Congress, CAC 2022; Vol. 2022-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CAC57257.2022.10055953