摘要
The thriving field of multi-agent reinforcement learning (MARL) studies how a group of interacting agents make decisions autonomously in a shared dynamic environment. Existing theoretical studies in this area suffer from at least two of the following obstacles: memory inefficiency, the heavy dependence of sample complexity on the long horizon and the large state space, the high computational complexity, non-Markov policy, non-Nash policy, and high burn-in cost. In this work, we take a step towards settling this problem by designing a model-free self-play algorithm Memory-Efficient Nash Q-Learning (ME-Nash-QL) for two-player zero-sum Markov games, which is a specific setting of MARL. ME-Nash-QL is proven to enjoy the following merits. First, it can output an ε-approximate Nash policy with space complexity O(SABH) and sample complexity Õ(H4SAB/ε2), where S is the number of states, {A, B} is the number of actions for two players, and H is the horizon length. It outperforms existing algorithms in terms of space complexity for tabular cases, and in terms of sample complexity for long horizons, i.e., when min{A, B} ≪ H2. Second, ME-Nash-QL achieves the lowest computational complexity O(Tpoly(AB)) while preserving Markov policies, where T is the number of samples. Third, ME-Nash-QL also achieves the best burn-in cost O(SAB poly(H)), whereas previous algorithms have a burn-in cost of at least O(S3AB poly(H)) to attain the same level of sample complexity with ours.
| 源语言 | 英语 |
|---|---|
| 出版状态 | 已出版 - 2024 |
| 已对外发布 | 是 |
| 活动 | 12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, 奥地利 期限: 7 5月 2024 → 11 5月 2024 |
会议
| 会议 | 12th International Conference on Learning Representations, ICLR 2024 |
|---|---|
| 国家/地区 | 奥地利 |
| 市 | Hybrid, Vienna |
| 时期 | 7/05/24 → 11/05/24 |
指纹
探究 'PROVABLE MEMORY EFFICIENT SELF-PLAY ALGORITHM FOR MODEL-FREE REINFORCEMENT LEARNING' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver