Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

Yifan Hu, Junjie Fu*, Guanghui Wen, Yuezu Lv, Wei Ren

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

Sample efficiency is a limiting factor for existing distributed multi-agent reinforcement learning (MARL) algorithms over networked multi-agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy-regularized MARL problem is formulated under the model of networked multi-agent Markov decision processes with observation-based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on-policy distributed actor–critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off-policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi-agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments.

源语言英语
文章编号111652
期刊Automatica
164
DOI
出版状态已出版 - 6月 2024

指纹

探究 'Distributed entropy-regularized multi-agent reinforcement learning with policy consensus' 的科研主题。它们共同构成独一无二的指纹。

引用此