A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning

Yifan Hu; Junjie Fu; Yuezu Lv

doi:10.1109/ICPS49255.2021.9468152

A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning

Yifan Hu, Junjie Fu, Yuezu Lv

Southeast University, Nanjing

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.

Original language	English
Title of host publication	Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	829-834
Number of pages	6
ISBN (Electronic)	9781728162072
DOIs	https://doi.org/10.1109/ICPS49255.2021.9468152
Publication status	Published - 10 May 2021
Externally published	Yes
Event	4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021 - Virtual, Online Duration: 10 May 2021 → 13 May 2021

Publication series

Name	Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021

Conference

Conference	4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021
City	Virtual, Online
Period	10/05/21 → 13/05/21

Keywords

Average reward
Optimal control
Reinforcement learning
Stochastic discrete-time system

Access to Document

10.1109/ICPS49255.2021.9468152

Cite this

Hu, Y., Fu, J., & Lv, Y. (2021). A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning. In Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021 (pp. 829-834). Article 9468152 (Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPS49255.2021.9468152

Hu, Yifan ; Fu, Junjie ; Lv, Yuezu. / A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning. Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021. Institute of Electrical and Electronics Engineers Inc., 2021. pp. 829-834 (Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021).

@inproceedings{56581483afe0443c97ba5c1d0a35190a,

title = "A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning",

abstract = "In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.",

keywords = "Average reward, Optimal control, Reinforcement learning, Stochastic discrete-time system",

author = "Yifan Hu and Junjie Fu and Yuezu Lv",

note = "Publisher Copyright: {\textcopyright} 2021 IEEE.; 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021 ; Conference date: 10-05-2021 Through 13-05-2021",

year = "2021",

month = may,

day = "10",

doi = "10.1109/ICPS49255.2021.9468152",

language = "English",

series = "Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "829--834",

booktitle = "Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021",

address = "United States",

}

Hu, Y, Fu, J & Lv, Y 2021, A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning. in Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021., 9468152, Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021, Institute of Electrical and Electronics Engineers Inc., pp. 829-834, 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021, Virtual, Online, 10/05/21. https://doi.org/10.1109/ICPS49255.2021.9468152

A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning. / Hu, Yifan; Fu, Junjie; Lv, Yuezu.
Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021. Institute of Electrical and Electronics Engineers Inc., 2021. p. 829-834 9468152 (Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning

AU - Hu, Yifan

AU - Fu, Junjie

AU - Lv, Yuezu

PY - 2021/5/10

Y1 - 2021/5/10

N2 - In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.

AB - In this paper, a class of optimal control problem for stochastic discrete-time systems is addressed by average reward reinforcement learning. First, the optimal control problem of the stochastic discrete-time system is transformed into a sequential decision problem for Markov decision process (MDP). It is proven that the admissible policies are gain-optimal and the optimal policy is bias-optimal with the average reward criterion, respectively. Then, sufficient conditions to almost surely (a.s.) stabilize the system are proposed. Based on the above results, an on-policy average-reward-based reinforcement learning algorithm is developed. Finally, simulation results are provided to illustrate the effectiveness of the proposed algorithm.

KW - Average reward

KW - Optimal control

KW - Reinforcement learning

KW - Stochastic discrete-time system

UR - http://www.scopus.com/inward/record.url?scp=85112361849&partnerID=8YFLogxK

U2 - 10.1109/ICPS49255.2021.9468152

DO - 10.1109/ICPS49255.2021.9468152

M3 - Conference contribution

AN - SCOPUS:85112361849

T3 - Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021

SP - 829

EP - 834

BT - Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021

Y2 - 10 May 2021 through 13 May 2021

ER -

Hu Y, Fu J, Lv Y. A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning. In Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021. Institute of Electrical and Electronics Engineers Inc. 2021. p. 829-834. 9468152. (Proceedings - 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems, ICPS 2021). doi: 10.1109/ICPS49255.2021.9468152

A class of optimal control problem for stochastic discrete-time systems with average reward reinforcement learning

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this