An Efficient Policy Gradient Algorithm with Historical Behaviors Reusing in Multi Agent System

Ao Ding, Huaqing Zhang, Hongbin Ma

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In multi-agent reinforcement learning, the algorithm's sampling efficiency of historical experience trajectory is regarded as the key to improving the working effect of the agent. In order to make full use of interactive data to improve the sampling ability of agents, an efficient multi-agent reinforcement learning algorithm is proposed in this paper. For multi-agent systems, the policy gradient algorithm with historical behavior reusing (MAPG-HBR) proposed in this paper can take into account the influence of historical behavior on the policy in the policy promotion stage, so that the multi-agents can learn the approximately optimal joint policy. To obtain the advantage functions used in MAPG-HBR with only one critic network, a theoretically interpretable twin universal critic network is proposed in this paper, which is capable of simultaneously estimating the action-value function as well as the state-value function and the corresponding objective value function for Clipped Double Q Learning. We compare the effectiveness of this algorithm with several baselines in Waterworld and Multi-Agent Mujoco, which are currently very popular multi-agent test environments. The results show that MAPG-HBR algorithm has better performance than other algorithms in the environments.

Original languageEnglish
Title of host publication2024 International Annual Conference on Complex Systems and Intelligent Science, CSIS-IAC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages585-592
Number of pages8
ISBN (Electronic)9798331504755
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event2024 International Annual Conference on Complex Systems and Intelligent Science, CSIS-IAC 2024 - Guangzhou, China
Duration: 20 Sept 202422 Sept 2024

Publication series

Name2024 International Annual Conference on Complex Systems and Intelligent Science, CSIS-IAC 2024

Conference

Conference2024 International Annual Conference on Complex Systems and Intelligent Science, CSIS-IAC 2024
Country/TerritoryChina
CityGuangzhou
Period20/09/2422/09/24

Keywords

  • Historical behaviors reusing
  • Multi-agent system
  • Policy gradient
  • Sampling efficiency

Fingerprint

Dive into the research topics of 'An Efficient Policy Gradient Algorithm with Historical Behaviors Reusing in Multi Agent System'. Together they form a unique fingerprint.

Cite this