Decentralized TD tracking with linear function approximation and its finite-time analysis

Gang Wang*, Songtao Lu, Georgios B. Giannakis, Gerald Tesauro, Jian Sun

*此作品的通讯作者

科研成果: 期刊稿件会议文章同行评审

19 引用 (Scopus)

摘要

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.

源语言英语
期刊Advances in Neural Information Processing Systems
2020-December
出版状态已出版 - 2020
活动34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
期限: 6 12月 202012 12月 2020

指纹

探究 'Decentralized TD tracking with linear function approximation and its finite-time analysis' 的科研主题。它们共同构成独一无二的指纹。

引用此