TY - JOUR
T1 - Finding the Equilibrium for Continuous Constrained Markov Games under the Average Criteria
AU - Jiang, Xiaofeng
AU - Chen, Shuangwu
AU - Yang, Jian
AU - Hu, Han
AU - Zhang, Zhenliang
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - For Markov game with cost constraints and continuous actions, the local constraint of single-decision maker is the interacted result of joint actions taken by the other decision makers, and is usually eliminated by imposing penalties on the undesired states and policies, which may suffer from the failure of penalties as the game policy changes and the nonexistence of the mixed policies. In this article, a framework of the actor-critic deep neural network is utilized to solve this problem. The actor network establishes the continuous pure policy to replace the mixed policy, and the critic network converts the global interacted results into a local performance potential. The local search for a constrained equilibrium average objective is converted into an unconstrained minimax optimization. Based on the equivalent conversion, the optimality function of the local action is given to evaluate the influence of the single decision maker's action on the global system. The proposed algorithm simultaneously iterates the local constraint multiplier and policy along opposite directions, and a typical congestion control numerical result in the emerging Internet of Things shows the efficiency.
AB - For Markov game with cost constraints and continuous actions, the local constraint of single-decision maker is the interacted result of joint actions taken by the other decision makers, and is usually eliminated by imposing penalties on the undesired states and policies, which may suffer from the failure of penalties as the game policy changes and the nonexistence of the mixed policies. In this article, a framework of the actor-critic deep neural network is utilized to solve this problem. The actor network establishes the continuous pure policy to replace the mixed policy, and the critic network converts the global interacted results into a local performance potential. The local search for a constrained equilibrium average objective is converted into an unconstrained minimax optimization. Based on the equivalent conversion, the optimality function of the local action is given to evaluate the influence of the single decision maker's action on the global system. The proposed algorithm simultaneously iterates the local constraint multiplier and policy along opposite directions, and a typical congestion control numerical result in the emerging Internet of Things shows the efficiency.
KW - Constrained Markov game (MG)
KW - continuous state and action
KW - expected average criteria
KW - optimality equation
KW - performance potential
UR - http://www.scopus.com/inward/record.url?scp=85097653359&partnerID=8YFLogxK
U2 - 10.1109/TAC.2020.2970153
DO - 10.1109/TAC.2020.2970153
M3 - Article
AN - SCOPUS:85097653359
SN - 0018-9286
VL - 65
SP - 5399
EP - 5406
JO - IEEE Transactions on Automatic Control
JF - IEEE Transactions on Automatic Control
IS - 12
M1 - 8972586
ER -