Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Xinxing Li, Zhihong Peng*, Li Liang, Wenzhong Zha

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)

Abstract

In this paper, a policy iteration-based Q-learning algorithm is proposed to solve infinite horizon linear nonzero-sum quadratic differential games with completely unknown dynamics. The Q-learning algorithm, which employs off-policy reinforcement learning (RL), can learn the Nash equilibrium and the corresponding value functions online, using the data sets generated by behavior policies. First, we prove equivalence between the proposed off-policy Q-learning algorithm and an offline PI algorithm by selecting specific initially admissible polices that can be learned online. Then, the convergence of the off-policy Q-learning algorithm is proved under a mild rank condition that can be easily met by injecting appropriate probing noises into behavior policies. The generated data sets can be repeatedly used during the learning process, which is computationally effective. The simulation results demonstrate the effectiveness of the proposed Q-learning algorithm.

Original languageEnglish
Article number52204
JournalScience China Information Sciences
Volume62
Issue number5
DOIs
Publication statusPublished - 1 May 2019

Keywords

  • ADP
  • PI
  • Q-learning
  • RL
  • adaptive dynamic programming
  • linear nonzero-sum quadratic differential games
  • off-policy
  • policy iteration
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Policy iteration based Q-learning for linear nonzero-sum quadratic differential games'. Together they form a unique fingerprint.

Cite this