跳到主要导航 跳到搜索 跳到主要内容

PROVABLE MEMORY EFFICIENT SELF-PLAY ALGORITHM FOR MODEL-FREE REINFORCEMENT LEARNING

  • Zhejiang University
  • Beijing Institute of Technology
  • CAS - Institute of Acoustics

科研成果: 会议稿件论文同行评审

摘要

The thriving field of multi-agent reinforcement learning (MARL) studies how a group of interacting agents make decisions autonomously in a shared dynamic environment. Existing theoretical studies in this area suffer from at least two of the following obstacles: memory inefficiency, the heavy dependence of sample complexity on the long horizon and the large state space, the high computational complexity, non-Markov policy, non-Nash policy, and high burn-in cost. In this work, we take a step towards settling this problem by designing a model-free self-play algorithm Memory-Efficient Nash Q-Learning (ME-Nash-QL) for two-player zero-sum Markov games, which is a specific setting of MARL. ME-Nash-QL is proven to enjoy the following merits. First, it can output an ε-approximate Nash policy with space complexity O(SABH) and sample complexity Õ(H4SAB/ε2), where S is the number of states, {A, B} is the number of actions for two players, and H is the horizon length. It outperforms existing algorithms in terms of space complexity for tabular cases, and in terms of sample complexity for long horizons, i.e., when min{A, B} ≪ H2. Second, ME-Nash-QL achieves the lowest computational complexity O(Tpoly(AB)) while preserving Markov policies, where T is the number of samples. Third, ME-Nash-QL also achieves the best burn-in cost O(SAB poly(H)), whereas previous algorithms have a burn-in cost of at least O(S3AB poly(H)) to attain the same level of sample complexity with ours.

源语言英语
出版状态已出版 - 2024
已对外发布
活动12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, 奥地利
期限: 7 5月 202411 5月 2024

会议

会议12th International Conference on Learning Representations, ICLR 2024
国家/地区奥地利
Hybrid, Vienna
时期7/05/2411/05/24

指纹

探究 'PROVABLE MEMORY EFFICIENT SELF-PLAY ALGORITHM FOR MODEL-FREE REINFORCEMENT LEARNING' 的科研主题。它们共同构成独一无二的指纹。

引用此