Decentralized TD tracking with linear function approximation and its finite-time analysis

Gang Wang; Songtao Lu; Georgios B. Giannakis; Gerald Tesauro; Jian Sun

Decentralized TD tracking with linear function approximation and its finite-time analysis

Gang Wang^*, Songtao Lu, Georgios B. Giannakis, Gerald Tesauro, Jian Sun

^*此作品的通讯作者

自动化学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

19 引用（Scopus）

摘要

The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.

源语言	英语
期刊	Advances in Neural Information Processing Systems
卷	2020-December
出版状态	已出版 - 2020
活动	34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online 期限: 6 12月 2020 → 12 12月 2020

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{3f283f0ea1d043cdbd2eb70028cce337,

title = "Decentralized TD tracking with linear function approximation and its finite-time analysis",

abstract = "The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.",

author = "Gang Wang and Songtao Lu and Giannakis, {Georgios B.} and Gerald Tesauro and Jian Sun",

note = "Publisher Copyright: {\textcopyright} 2020 Neural information processing systems foundation. All rights reserved.; 34th Conference on Neural Information Processing Systems, NeurIPS 2020 ; Conference date: 06-12-2020 Through 12-12-2020",

year = "2020",

language = "English",

volume = "2020-December",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - Decentralized TD tracking with linear function approximation and its finite-time analysis

AU - Wang, Gang

AU - Lu, Songtao

AU - Giannakis, Georgios B.

AU - Tesauro, Gerald

AU - Sun, Jian

PY - 2020

Y1 - 2020

N2 - The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.

AB - The present contribution deals with decentralized policy evaluation in multi-agent Markov decision processes using temporal-difference (TD) methods with linear function approximation for scalability. The agents cooperate to estimate the value function of such a process by observing continual state transitions of a shared environment over the graph of interconnected nodes (agents), along with locally private rewards. Different from existing consensus-type TD algorithms, the approach here develops a simple decentralized TD tracker by wedding TD learning with gradient tracking techniques. The non-asymptotic properties of the novel TD tracker are established for both independent and identically distributed (i.i.d.) as well as Markovian transitions through a unifying multistep Lyapunov analysis. In contrast to the prior art, the novel algorithm forgoes the limiting error bounds on the number of agents, which endows it with performance comparable to that of centralized TD methods that are the sharpest known to date.

UR - http://www.scopus.com/inward/record.url?scp=85103810816&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85103810816

SN - 1049-5258

VL - 2020-December

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020

Y2 - 6 December 2020 through 12 December 2020

ER -

Decentralized TD tracking with linear function approximation and its finite-time analysis

摘要

其它文件与链接

指纹

引用此