Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Xinxing Li, Zhihong Peng*, Lei Jiao, Lele Xi, Junqi Cai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)

Abstract

A model-based offline policy iteration (PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can learn the feedback Nash equilibrium online using the state samples generated by behavior policies, without sending inquiries to the system model. Unlike the existing Q-learning methods, this novel Q-learning algorithm executes both policy evaluation and policy improvement in an adaptive manner. We prove the convergence of the offline PI algorithm by proving its equivalence to Newton’s method while solving the game algebraic Riccati equation (GARE). Furthermore, we prove that the proposed Q-learning method will converge to the Nash equilibrium under a small learning rate if the method satisfies certain persistence of excitation conditions, which can be easily met by suitable behavior policies. Our simulation results demonstrate the good performance of the proposed online adaptive Q-learning algorithm.

Original languageEnglish
Article number222201
JournalScience China Information Sciences
Volume62
Issue number12
DOIs
Publication statusPublished - 1 Dec 2019

Keywords

  • Q-learning
  • adaptive dynamic programming
  • fully cooperative linear quadratic dynamic games
  • off-policy
  • policy iteration
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games'. Together they form a unique fingerprint.

Cite this