TY - GEN
T1 - Continuous Policy Multi-Agent Deep Reinforcement Learning with Generalizable Episodic Memory
AU - Ni, Wenjing
AU - Wang, Bo
AU - Zhong, Hua
AU - Guo, Xiang
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.
AB - Multi-agent reinforcement learning (MARL) has been plagued by low sample efficiency. It needs far more samples than human learning to achieve convergence and learn successful strategies. And this situation is more serious in continuous state and policy space. Episodic memory (EM), as an effective method to improve the sample efficiency of reinforcement learning (RL) by imitating the ability of human rapid learning, has currently made little effort in continuous policy space and MARL. Therefore, we propose a continuous policy multi-agent reinforcement learning method with generalizable episodic memory (ECM). It establishes a centralized memory parameter network and memory buffer for each agent, and updates memory through implicit planning, so that the episodic memory model can use neural networks to learn successful strategies from the past successful experience. Thus, the model can adapt to the continuous policy space. Moreover, ECM combines MARL's idea of decentralized execution and centralized training (CTDE) with episodic memory model to make the model adapt to multi-agent task environment. Simulation results show that ECM method can effectively improve the sample efficiency of MARL algorithm, and the learned strategy has higher accuracy.
KW - continuous policy
KW - generalizable episodic memory
KW - multi-agent reinforcement learning
KW - sample efficiency
UR - http://www.scopus.com/inward/record.url?scp=85151149182&partnerID=8YFLogxK
U2 - 10.1109/CAC57257.2022.10055953
DO - 10.1109/CAC57257.2022.10055953
M3 - Conference contribution
AN - SCOPUS:85151149182
T3 - Proceedings - 2022 Chinese Automation Congress, CAC 2022
SP - 1699
EP - 1704
BT - Proceedings - 2022 Chinese Automation Congress, CAC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 Chinese Automation Congress, CAC 2022
Y2 - 25 November 2022 through 27 November 2022
ER -