A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning

Yifan Hu, Junjie Fu, Yuezu Lv

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.

源语言英语
主期刊名Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021
出版商Institute of Electrical and Electronics Engineers Inc.
829-834
页数6
ISBN(电子版)9781728162072
DOI
出版状态已出版 - 10 5月 2021
已对外发布
活动4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021 - Virtual, Online
期限: 10 5月 202113 5月 2021

出版系列

姓名Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021

会议

会议4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021
Virtual, Online
时期10/05/2113/05/21

指纹

探究 'A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning' 的科研主题。它们共同构成独一无二的指纹。

引用此