Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Jun Sun, Gang Wang, Georgios B. Giannakis, Qinmin Yang, Zaiyue Yang

科研成果: 期刊稿件会议文章同行评审

26 引用 (Scopus)

摘要

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in various engineering applications, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an 'additional' projection step to control the 'gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a neighborhood of the optimum. The resultant error bounds are the first of its type-in the sense that they hold under the most practical assumptions - which is made possible by means of a novel multi-step Lyapunov analysis.

源语言英语
页(从-至)4485-4495
页数11
期刊Proceedings of Machine Learning Research
108
出版状态已出版 - 2020
已对外发布
活动23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online
期限: 26 8月 202028 8月 2020

指纹

探究 'Finite-Time Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation' 的科研主题。它们共同构成独一无二的指纹。

引用此