强化学习与自适应动态规划: 从基础理论到多智能体系统中的应用进展综述

Guang Hui Wen, Tao Yang*, Jia Ling Zhou, Jun Jie Fu, Lei Xu

*此作品的通讯作者

科研成果: 期刊稿件文献综述同行评审

7 引用 (Scopus)

摘要

Reinforcement learning (RL) and adaptive/approximate dynamic programming (ADP) algorithms have recently received much attention from various scientific fields (e.g., artificial intelligence, systems and control, and applied mathematics). This is partly due to their successful applications in a series of challenging problems, such as the sequential decision and optimal coordination control problems of large-scale multi-agent systems. In this paper, some preliminaries on RL and ADP algorithms are firstly introduced, and then the developments of these two closely related algorithms in different research fields are reviewed respectively, with emphasis on the developments from solving the sequential decision (optimal control) problem for single agent (control plant) to the sequential decision (optimal coordination control) problem of multi-agent systems by utilizing these two algorithms. Furthermore, after briefly surveying the structure evolution of the ADP algorithm in the last decades and the recent development of the ADP algorithm from model-based offline programming framework to model-free online learning framework, the research progress of the ADP algorithm in solving the optimal coordination control problem of multi-agent systems is reviewed. Finally, some interesting yet challenging issues on MARL algorithms and using ADP algorithms to solve optimal coordination control problem of multi-agent systems are suggested.

投稿的翻译标题Reinforcement learning and adaptive/approximate dynamic programming: A survey from theory to applications in multi-agent systems
源语言繁体中文
页(从-至)1200-1230
页数31
期刊Kongzhi yu Juece/Control and Decision
38
5
DOI
出版状态已出版 - 5月 2023

关键词

  • Markov decision process
  • adaptive/approximate dynamic programming
  • multi-agent system
  • optimal coordination control
  • reinforcement learning
  • sequential decision

指纹

探究 '强化学习与自适应动态规划: 从基础理论到多智能体系统中的应用进展综述' 的科研主题。它们共同构成独一无二的指纹。

引用此