TY - JOUR
T1 - Distributed entropy-regularized multi-agent reinforcement learning with policy consensus
AU - Hu, Yifan
AU - Fu, Junjie
AU - Wen, Guanghui
AU - Lv, Yuezu
AU - Ren, Wei
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Sample efficiency is a limiting factor for existing distributed multi-agent reinforcement learning (MARL) algorithms over networked multi-agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy-regularized MARL problem is formulated under the model of networked multi-agent Markov decision processes with observation-based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on-policy distributed actor–critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off-policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi-agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments.
AB - Sample efficiency is a limiting factor for existing distributed multi-agent reinforcement learning (MARL) algorithms over networked multi-agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy-regularized MARL problem is formulated under the model of networked multi-agent Markov decision processes with observation-based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on-policy distributed actor–critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off-policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi-agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments.
KW - Deep reinforcement learning
KW - Distributed actor–critic algorithm
KW - Entropy regularization
KW - Networked multi-agent system
UR - http://www.scopus.com/inward/record.url?scp=85189507331&partnerID=8YFLogxK
U2 - 10.1016/j.automatica.2024.111652
DO - 10.1016/j.automatica.2024.111652
M3 - Article
AN - SCOPUS:85189507331
SN - 0005-1098
VL - 164
JO - Automatica
JF - Automatica
M1 - 111652
ER -