A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems

Mingduo Lin, Derong Liu, Bo Zhao, Qionghai Dai, Yi Dong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

This paper focuses on the data-driven controller design for optimal control problems of nonlinear nonaffine discrete-Time systems. A novel policy gradient and Q-learning (PGQL) adaptive algorithm which learns the optimal control policy from real empirical data is developed without requiring system dynamics. A policy iteration scheme is designed to iteratively update the approximate Q-function, and the control policy is improved via gradient method until they converge to the bounded regions of the optimal Q-function and the optimal control policy, respectively. Two neural networks (NNs) are employed to realize the developed algorithm. Moreover, the convergence analysis of approximate Q-function is established. Since the control policy is parameterized, it can be upgraded through updating the actor-NN parameters in the direction of the performance gradient. Finally, the simulation results are given to verify the performance of the developed PGQL adaptive algorithm.

Original languageEnglish
Title of host publication9th International Conference on Information Science and Technology, ICIST 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6-10
Number of pages5
ISBN (Electronic)9781728121062
DOIs
Publication statusPublished - Aug 2019
Event9th International Conference on Information Science and Technology, ICIST 2019 - Hulunbuir, China
Duration: 2 Aug 20195 Aug 2019

Publication series

Name9th International Conference on Information Science and Technology, ICIST 2019

Conference

Conference9th International Conference on Information Science and Technology, ICIST 2019
Country/TerritoryChina
CityHulunbuir
Period2/08/195/08/19

Keywords

  • Adaptive dynamic programming
  • Data-driven
  • Optimal control
  • Policy gradient
  • Q-learning
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'A Combined Policy Gradient and Q-learning Method for Data-driven Optimal Control Problems'. Together they form a unique fingerprint.

Cite this