Finite-Time Error Bounds for Biased Stochastic Approximation with Application to Q-Learning

Gang Wang; Georgios B. Giannakis

Finite-Time Error Bounds for Biased Stochastic Approximation with Application to Q-Learning

Gang Wang, Georgios B. Giannakis

University of Minnesota Twin Cities

科研成果: 期刊稿件 › 会议文章 › 同行评审

9 引用（Scopus）

摘要

Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an 'ergodic-like' assumption on the underlying stochastic noise sequence. Leveraging a multistep Lyapunov function that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of biased SA algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is the first of its kind, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

源语言	英语
页（从-至）	3015-3024
页数	10
期刊	Proceedings of Machine Learning Research
卷	108
出版状态	已出版 - 2020
已对外发布	是
活动	23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online 期限: 26 8月 2020 → 28 8月 2020

其它文件与链接

链接到 Scopus 的出版物

引用此

Wang, G., & Giannakis, G. B. (2020). Finite-Time Error Bounds for Biased Stochastic Approximation with Application to Q-Learning. Proceedings of Machine Learning Research, 108, 3015-3024.

@article{69d955ded2b94a42a3d0b65d7b526148,

title = "Finite-Time Error Bounds for Biased Stochastic Approximation with Application to Q-Learning",

abstract = "Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an 'ergodic-like' assumption on the underlying stochastic noise sequence. Leveraging a multistep Lyapunov function that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of biased SA algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is the first of its kind, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.",

author = "Gang Wang and Giannakis, {Georgios B.}",

note = "Publisher Copyright: Copyright {\textcopyright} 2020 by the author(s); 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 ; Conference date: 26-08-2020 Through 28-08-2020",

year = "2020",

language = "English",

volume = "108",

pages = "3015--3024",

journal = "Proceedings of Machine Learning Research",

issn = "2640-3498",

publisher = "ML Research Press",

}

TY - JOUR

T1 - Finite-Time Error Bounds for Biased Stochastic Approximation with Application to Q-Learning

AU - Wang, Gang

AU - Giannakis, Georgios B.

PY - 2020

Y1 - 2020

N2 - Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an 'ergodic-like' assumption on the underlying stochastic noise sequence. Leveraging a multistep Lyapunov function that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of biased SA algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is the first of its kind, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

AB - Inspired by the widespread use of Q-learning algorithms in reinforcement learning (RL), this present paper studies a class of biased stochastic approximation (SA) procedures under an 'ergodic-like' assumption on the underlying stochastic noise sequence. Leveraging a multistep Lyapunov function that looks ahead to several future updates to accommodate the gradient bias, we prove a general result on the convergence of the iterates, and use it to derive finite-time bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders the finite-time analysis of biased SA algorithms under a broad family of stochastic perturbations possible. For direct comparison with past works, we also demonstrate these bounds by applying them to Q-learning with linear function approximation, under the realistic Markov chain observation model. The resultant finite-time error bound for Q-learning is the first of its kind, in the sense that it holds: i) for the unmodified version (i.e., without making any modifications to the updates), and ii), for Markov chains starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

UR - http://www.scopus.com/inward/record.url?scp=85108389875&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85108389875

SN - 2640-3498

VL - 108

SP - 3015

EP - 3024

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020

Y2 - 26 August 2020 through 28 August 2020

ER -

Finite-Time Error Bounds for Biased Stochastic Approximation with Application to Q-Learning

摘要

其它文件与链接

指纹

引用此