跳到主要导航 跳到搜索 跳到主要内容

PRIME: Policy Representation Integration With Metavalue-Modulated Evolution in Multiagent Reinforcement Learning

  • Beijing Institute of Technology
  • National University of Singapore

科研成果: 期刊稿件文章同行评审

摘要

Multiagent reinforcement learning (MARL) remains fundamentally challenged by partial observability, unstable value learning, and inefficient exploration - difficulties that intensify in high-dimensional robotic control and large-scale coordination scenarios. Meanwhile, the existing algorithms lack a mechanism to guide the improvement of long-term strategies. We propose policy representation integration for metavalue evolution (PRIME) in MARL that addresses these limitations through representation-asymmetric policy parameterization, metavalue-augmented optimization, and metavalue-modulated evolutionary search. PRIME constructs a shared nonlinear encoder with lightweight team-specific linear heads, providing a coherent latent policy manifold that supports both fine-grained robotic manipulation and large-population coordination. A learned metavalue function estimates the long-horizon utility of policy updates, whose gradients shape both actor learning and representation formation. In parallel, evolutionary operators - direction-aware crossover and metagradient-scaled low-rank mutation - enable globally diverse yet strategically targeted exploration in policy space. Evaluations on multiagent MuJoCo (MA-MuJoCo), DexHands dexterous manipulation, and the large-scale decentralized collective assault (DCA) benchmark demonstrate that PRIME achieves consistently superior performance, faster convergence, and stronger robustness than state-of-the-art baselines.

源语言英语
页(从-至)24893-24911
页数19
期刊IEEE Internet of Things Journal
13
11
DOI
出版状态已出版 - 2026

指纹

探究 'PRIME: Policy Representation Integration With Metavalue-Modulated Evolution in Multiagent Reinforcement Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此