Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Xinxing Li, Zhihong Peng*, Li Liang, Wenzhong Zha

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

18 引用 (Scopus)

摘要

In this paper, a policy iteration-based Q-learning algorithm is proposed to solve infinite horizon linear nonzero-sum quadratic differential games with completely unknown dynamics. The Q-learning algorithm, which employs off-policy reinforcement learning (RL), can learn the Nash equilibrium and the corresponding value functions online, using the data sets generated by behavior policies. First, we prove equivalence between the proposed off-policy Q-learning algorithm and an offline PI algorithm by selecting specific initially admissible polices that can be learned online. Then, the convergence of the off-policy Q-learning algorithm is proved under a mild rank condition that can be easily met by injecting appropriate probing noises into behavior policies. The generated data sets can be repeatedly used during the learning process, which is computationally effective. The simulation results demonstrate the effectiveness of the proposed Q-learning algorithm.

源语言英语
文章编号52204
期刊Science China Information Sciences
62
5
DOI
出版状态已出版 - 1 5月 2019

指纹

探究 'Policy iteration based Q-learning for linear nonzero-sum quadratic differential games' 的科研主题。它们共同构成独一无二的指纹。

引用此