Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

Yifan Hu, Junjie Fu*, Guanghui Wen, Yuezu Lv, Wei Ren

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Sample efficiency is a limiting factor for existing distributed multi-agent reinforcement learning (MARL) algorithms over networked multi-agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy-regularized MARL problem is formulated under the model of networked multi-agent Markov decision processes with observation-based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on-policy distributed actor–critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off-policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi-agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments.

Original languageEnglish
Article number111652
JournalAutomatica
Volume164
DOIs
Publication statusPublished - Jun 2024

Keywords

  • Deep reinforcement learning
  • Distributed actor–critic algorithm
  • Entropy regularization
  • Networked multi-agent system

Fingerprint

Dive into the research topics of 'Distributed entropy-regularized multi-agent reinforcement learning with policy consensus'. Together they form a unique fingerprint.

Cite this