Multi-time scale hierarchical trust domain leads to the improvement of MAPPO algorithm

Zhentao Guo, Licheng Sun, Guiyu Zhao, Tianhao Wang, Ao Ding, Hongbin Ma*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-agent Proximal Policy Optimization is a ubiquitous on-policy reinforcement learning algorithm, but its usage is significantly lower than that of off-policy learning algorithms in multi-agent environments. The existing MAPPO algorithm has the problem of insufficient generalization ability, adaptability and training stability when dealing with complex tasks. In this paper, we propose an improved trust domain guided MAPPO algorithm with multi-time scale hierarchical structure, which aims to cope with the dynamic changes of hierarchical structure and multi-time scale of tasks. A multi-time scale hierarchical structure is introduced by the algorithm, along with trust domain constraints and L2 norm regularization to prevent the instability of policy performance caused by too large updates. Finally, through the experimental verification of Decentralized Collective Assault (DCA), our algorithm has achieved significant improvements in various performance indicators, indicating that it has better effect and robustness in dealing with complex tasks.

Original languageEnglish
Title of host publicationProceedings of the 43rd Chinese Control Conference, CCC 2024
EditorsJing Na, Jian Sun
PublisherIEEE Computer Society
Pages6109-6114
Number of pages6
ISBN (Electronic)9789887581581
DOIs
Publication statusPublished - 2024
Event43rd Chinese Control Conference, CCC 2024 - Kunming, China
Duration: 28 Jul 202431 Jul 2024

Publication series

NameChinese Control Conference, CCC
ISSN (Print)1934-1768
ISSN (Electronic)2161-2927

Conference

Conference43rd Chinese Control Conference, CCC 2024
Country/TerritoryChina
CityKunming
Period28/07/2431/07/24

Keywords

  • L2 norm regularization
  • MAPPO
  • Multi-time scale hierarchical structure
  • Trust domain

Cite this