Skip to main navigation Skip to search Skip to main content

PRIME: Policy Representation Integration With Metavalue-Modulated Evolution in Multiagent Reinforcement Learning

  • Licheng Sun*
  • , Hongbin Ma
  • *Corresponding author for this work
  • Beijing Institute of Technology
  • National University of Singapore

Research output: Contribution to journalArticlepeer-review

Abstract

Multiagent reinforcement learning (MARL) remains fundamentally challenged by partial observability, unstable value learning, and inefficient exploration - difficulties that intensify in high-dimensional robotic control and large-scale coordination scenarios. Meanwhile, the existing algorithms lack a mechanism to guide the improvement of long-term strategies. We propose policy representation integration for metavalue evolution (PRIME) in MARL that addresses these limitations through representation-asymmetric policy parameterization, metavalue-augmented optimization, and metavalue-modulated evolutionary search. PRIME constructs a shared nonlinear encoder with lightweight team-specific linear heads, providing a coherent latent policy manifold that supports both fine-grained robotic manipulation and large-population coordination. A learned metavalue function estimates the long-horizon utility of policy updates, whose gradients shape both actor learning and representation formation. In parallel, evolutionary operators - direction-aware crossover and metagradient-scaled low-rank mutation - enable globally diverse yet strategically targeted exploration in policy space. Evaluations on multiagent MuJoCo (MA-MuJoCo), DexHands dexterous manipulation, and the large-scale decentralized collective assault (DCA) benchmark demonstrate that PRIME achieves consistently superior performance, faster convergence, and stronger robustness than state-of-the-art baselines.

Original languageEnglish
Pages (from-to)24893-24911
Number of pages19
JournalIEEE Internet of Things Journal
Volume13
Issue number11
DOIs
Publication statusPublished - 2026

Keywords

  • Deep learning
  • large-scale multiagent systems (LMASs)
  • metavalue
  • multiagent reinforcement learning (MARL)
  • multirobot control

Fingerprint

Dive into the research topics of 'PRIME: Policy Representation Integration With Metavalue-Modulated Evolution in Multiagent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this