A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning

Yifan Hu, Junjie Fu, Yuezu Lv

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.

Original languageEnglish
Title of host publicationProceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages829-834
Number of pages6
ISBN (Electronic)9781728162072
DOIs
Publication statusPublished - 10 May 2021
Externally publishedYes
Event4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021 - Virtual, Online
Duration: 10 May 202113 May 2021

Publication series

NameProceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021

Conference

Conference4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021
CityVirtual, Online
Period10/05/2113/05/21

Keywords

  • Average reward
  • Optimal control
  • Reinforcement learning
  • Stochastic discrete-time system

Fingerprint

Dive into the research topics of 'A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning'. Together they form a unique fingerprint.

Cite this