Abstract
The thriving field of multi-agent reinforcement learning (MARL) studies how a group of interacting agents make decisions autonomously in a shared dynamic environment. Existing theoretical studies in this area suffer from at least two of the following obstacles: memory inefficiency, the heavy dependence of sample complexity on the long horizon and the large state space, the high computational complexity, non-Markov policy, non-Nash policy, and high burn-in cost. In this work, we take a step towards settling this problem by designing a model-free self-play algorithm Memory-Efficient Nash Q-Learning (ME-Nash-QL) for two-player zero-sum Markov games, which is a specific setting of MARL. ME-Nash-QL is proven to enjoy the following merits. First, it can output an ε-approximate Nash policy with space complexity O(SABH) and sample complexity Õ(H4SAB/ε2), where S is the number of states, {A, B} is the number of actions for two players, and H is the horizon length. It outperforms existing algorithms in terms of space complexity for tabular cases, and in terms of sample complexity for long horizons, i.e., when min{A, B} ≪ H2. Second, ME-Nash-QL achieves the lowest computational complexity O(Tpoly(AB)) while preserving Markov policies, where T is the number of samples. Third, ME-Nash-QL also achieves the best burn-in cost O(SAB poly(H)), whereas previous algorithms have a burn-in cost of at least O(S3AB poly(H)) to attain the same level of sample complexity with ours.
Original language | English |
---|---|
Publication status | Published - 2024 |
Externally published | Yes |
Event | 12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria Duration: 7 May 2024 → 11 May 2024 |
Conference
Conference | 12th International Conference on Learning Representations, ICLR 2024 |
---|---|
Country/Territory | Austria |
City | Hybrid, Vienna |
Period | 7/05/24 → 11/05/24 |