Finite-Time Error Bounds of Biased Stochastic Approximation with Application to TD-Learning

Gang Wang

doi:10.1109/TSP.2021.3128723

Finite-Time Error Bounds of Biased Stochastic Approximation with Application to TD-Learning

Gang Wang^*

^*Corresponding author for this work

School of Automation

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild 'ergodicity-like' assumption on the random noise sequence. Building on a multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (thus gaining control over the bias), we prove a general result on the convergence of the SA iterates, and use it to derive non-Asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders finite-Time analysis of biased SA algorithms under a family of stochastic perturbations possible. For direct comparison with prior work, we demonstrate these bounds by applying them to TD-learning with linear function approximation, under the Markov chain observation model. The resultant finite-Time error bound for TD-learning is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without any modification to the updates) using even nonlinear approximators; as well as for Markov chains ii) under sublinear mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

Original language	English
Pages (from-to)	950-962
Number of pages	13
Journal	IEEE Transactions on Signal Processing
Volume	70
DOIs	https://doi.org/10.1109/TSP.2021.3128723
Publication status	Published - 2022

Keywords

Lyapunov analysis
Markovian noise
Stochastic averaging
finite-Time analysis

Access to Document

10.1109/TSP.2021.3128723

Cite this

@article{4836a5a5a82f4e4dbc6f8040a1d18ca6,

title = "Finite-Time Error Bounds of Biased Stochastic Approximation with Application to TD-Learning",

abstract = "Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild 'ergodicity-like' assumption on the random noise sequence. Building on a multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (thus gaining control over the bias), we prove a general result on the convergence of the SA iterates, and use it to derive non-Asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders finite-Time analysis of biased SA algorithms under a family of stochastic perturbations possible. For direct comparison with prior work, we demonstrate these bounds by applying them to TD-learning with linear function approximation, under the Markov chain observation model. The resultant finite-Time error bound for TD-learning is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without any modification to the updates) using even nonlinear approximators; as well as for Markov chains ii) under sublinear mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.",

keywords = "Lyapunov analysis, Markovian noise, Stochastic averaging, finite-Time analysis",

author = "Gang Wang",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2022",

doi = "10.1109/TSP.2021.3128723",

language = "English",

volume = "70",

pages = "950--962",

journal = "IEEE Transactions on Signal Processing",

issn = "1053-587X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Finite-Time Error Bounds of Biased Stochastic Approximation with Application to TD-Learning

AU - Wang, Gang

PY - 2022

Y1 - 2022

N2 - Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild 'ergodicity-like' assumption on the random noise sequence. Building on a multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (thus gaining control over the bias), we prove a general result on the convergence of the SA iterates, and use it to derive non-Asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders finite-Time analysis of biased SA algorithms under a family of stochastic perturbations possible. For direct comparison with prior work, we demonstrate these bounds by applying them to TD-learning with linear function approximation, under the Markov chain observation model. The resultant finite-Time error bound for TD-learning is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without any modification to the updates) using even nonlinear approximators; as well as for Markov chains ii) under sublinear mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

AB - Motivated by the recent success of reinforcement learning algorithms, this paper studies a class of biased stochastic approximation (SA) procedures under a mild 'ergodicity-like' assumption on the random noise sequence. Building on a multistep Lyapunov function that looks ahead to several future updates to accommodate the stochastic perturbations (thus gaining control over the bias), we prove a general result on the convergence of the SA iterates, and use it to derive non-Asymptotic bounds on the mean-square error in the case of constant stepsizes. This novel viewpoint renders finite-Time analysis of biased SA algorithms under a family of stochastic perturbations possible. For direct comparison with prior work, we demonstrate these bounds by applying them to TD-learning with linear function approximation, under the Markov chain observation model. The resultant finite-Time error bound for TD-learning is the first of its kind, in the sense that it holds i) for the unmodified versions (i.e., without any modification to the updates) using even nonlinear approximators; as well as for Markov chains ii) under sublinear mixing conditions and iii) starting from any initial distribution, at least one of which has to be violated for existing results to be applicable.

KW - Lyapunov analysis

KW - Markovian noise

KW - Stochastic averaging

KW - finite-Time analysis

UR - http://www.scopus.com/inward/record.url?scp=85126638177&partnerID=8YFLogxK

U2 - 10.1109/TSP.2021.3128723

DO - 10.1109/TSP.2021.3128723

M3 - Article

AN - SCOPUS:85126638177

SN - 1053-587X

VL - 70

SP - 950

EP - 962

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

ER -

Finite-Time Error Bounds of Biased Stochastic Approximation with Application to TD-Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this