A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes

Haochen Tao, Shisheng Cui*, Zhuo Li, Jian Sun

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Reinforcement learning algorithms are central to the cognition and decision-making of embodied intelligent agents. A bilevel optimization (BO) modeling approach, along with a host of efficient BO algorithms, has been proven to be an effective means of addressing actor-critic (AC) policy optimization problems. In this work, based on a bilevel-structured AC problem model, an implicit zeroth-order stochastic algorithm is developed. A locally randomized spherical smoothing technique, which can be applied to nonsmooth nonconvex implicit AC formulations and avoid the closed-form lower-level mapping, is introduced. In the proposed zeroth-order scheme, the gradient of the implicit function can be approximated through inexact lower-level value estimations that are practically available. Under suitable assumptions, the algorithmic framework designed for the bilevel AC method is characterized by convergence guarantees under a fixed stepsize and smoothing parameter. Moreover, the proposed algorithm is equipped with the overall iteration complexity of O(n2L02L~02ϵ−1). The convergence performance of the proposed algorithm is verified through numerical simulations.

Original languageEnglish
Article number150204
JournalScience China Information Sciences
Volume68
Issue number5
DOIs
Publication statusPublished - May 2025

Keywords

  • actor-critic
  • bilevel optimization
  • implicit programming
  • stochastic approximation
  • zeroth-order algorithm

Fingerprint

Dive into the research topics of 'A zeroth-order stochastic implicit method for bilevel-structured actor-critic schemes'. Together they form a unique fingerprint.

Cite this