Finding the Equilibrium for Continuous Constrained Markov Games under the Average Criteria

Xiaofeng Jiang; Shuangwu Chen; Jian Yang; Han Hu; Zhenliang Zhang

doi:10.1109/TAC.2020.2970153

Finding the Equilibrium for Continuous Constrained Markov Games under the Average Criteria

Xiaofeng Jiang, Shuangwu Chen, Jian Yang^*, Han Hu, Zhenliang Zhang

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

4 引用（Scopus）

摘要

For Markov game with cost constraints and continuous actions, the local constraint of single-decision maker is the interacted result of joint actions taken by the other decision makers, and is usually eliminated by imposing penalties on the undesired states and policies, which may suffer from the failure of penalties as the game policy changes and the nonexistence of the mixed policies. In this article, a framework of the actor-critic deep neural network is utilized to solve this problem. The actor network establishes the continuous pure policy to replace the mixed policy, and the critic network converts the global interacted results into a local performance potential. The local search for a constrained equilibrium average objective is converted into an unconstrained minimax optimization. Based on the equivalent conversion, the optimality function of the local action is given to evaluate the influence of the single decision maker's action on the global system. The proposed algorithm simultaneously iterates the local constraint multiplier and policy along opposite directions, and a typical congestion control numerical result in the emerging Internet of Things shows the efficiency.

源语言	英语
文章编号	8972586
页（从-至）	5399-5406
页数	8
期刊	IEEE Transactions on Automatic Control
卷	65
期	12
DOI	https://doi.org/10.1109/TAC.2020.2970153
出版状态	已出版 - 12月 2020

访问文件

10.1109/TAC.2020.2970153

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{000195fd65224764badf40fc627f4bdc,

title = "Finding the Equilibrium for Continuous Constrained Markov Games under the Average Criteria",

abstract = "For Markov game with cost constraints and continuous actions, the local constraint of single-decision maker is the interacted result of joint actions taken by the other decision makers, and is usually eliminated by imposing penalties on the undesired states and policies, which may suffer from the failure of penalties as the game policy changes and the nonexistence of the mixed policies. In this article, a framework of the actor-critic deep neural network is utilized to solve this problem. The actor network establishes the continuous pure policy to replace the mixed policy, and the critic network converts the global interacted results into a local performance potential. The local search for a constrained equilibrium average objective is converted into an unconstrained minimax optimization. Based on the equivalent conversion, the optimality function of the local action is given to evaluate the influence of the single decision maker's action on the global system. The proposed algorithm simultaneously iterates the local constraint multiplier and policy along opposite directions, and a typical congestion control numerical result in the emerging Internet of Things shows the efficiency.",

keywords = "Constrained Markov game (MG), continuous state and action, expected average criteria, optimality equation, performance potential",

author = "Xiaofeng Jiang and Shuangwu Chen and Jian Yang and Han Hu and Zhenliang Zhang",

note = "Publisher Copyright: {\textcopyright} 1963-2012 IEEE.",

year = "2020",

month = dec,

doi = "10.1109/TAC.2020.2970153",

language = "English",

volume = "65",

pages = "5399--5406",

journal = "IEEE Transactions on Automatic Control",

issn = "0018-9286",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "12",

}

TY - JOUR

T1 - Finding the Equilibrium for Continuous Constrained Markov Games under the Average Criteria

AU - Jiang, Xiaofeng

AU - Chen, Shuangwu

AU - Yang, Jian

AU - Hu, Han

AU - Zhang, Zhenliang

PY - 2020/12

Y1 - 2020/12

N2 - For Markov game with cost constraints and continuous actions, the local constraint of single-decision maker is the interacted result of joint actions taken by the other decision makers, and is usually eliminated by imposing penalties on the undesired states and policies, which may suffer from the failure of penalties as the game policy changes and the nonexistence of the mixed policies. In this article, a framework of the actor-critic deep neural network is utilized to solve this problem. The actor network establishes the continuous pure policy to replace the mixed policy, and the critic network converts the global interacted results into a local performance potential. The local search for a constrained equilibrium average objective is converted into an unconstrained minimax optimization. Based on the equivalent conversion, the optimality function of the local action is given to evaluate the influence of the single decision maker's action on the global system. The proposed algorithm simultaneously iterates the local constraint multiplier and policy along opposite directions, and a typical congestion control numerical result in the emerging Internet of Things shows the efficiency.

AB - For Markov game with cost constraints and continuous actions, the local constraint of single-decision maker is the interacted result of joint actions taken by the other decision makers, and is usually eliminated by imposing penalties on the undesired states and policies, which may suffer from the failure of penalties as the game policy changes and the nonexistence of the mixed policies. In this article, a framework of the actor-critic deep neural network is utilized to solve this problem. The actor network establishes the continuous pure policy to replace the mixed policy, and the critic network converts the global interacted results into a local performance potential. The local search for a constrained equilibrium average objective is converted into an unconstrained minimax optimization. Based on the equivalent conversion, the optimality function of the local action is given to evaluate the influence of the single decision maker's action on the global system. The proposed algorithm simultaneously iterates the local constraint multiplier and policy along opposite directions, and a typical congestion control numerical result in the emerging Internet of Things shows the efficiency.

KW - Constrained Markov game (MG)

KW - continuous state and action

KW - expected average criteria

KW - optimality equation

KW - performance potential

UR - http://www.scopus.com/inward/record.url?scp=85097653359&partnerID=8YFLogxK

U2 - 10.1109/TAC.2020.2970153

DO - 10.1109/TAC.2020.2970153

M3 - Article

AN - SCOPUS:85097653359

SN - 0018-9286

VL - 65

SP - 5399

EP - 5406

JO - IEEE Transactions on Automatic Control

JF - IEEE Transactions on Automatic Control

IS - 12

M1 - 8972586

ER -

Finding the Equilibrium for Continuous Constrained Markov Games under the Average Criteria

摘要

访问文件

其它文件与链接

指纹

引用此