OM-TCN: A dynamic and agile opponent modeling approach for competitive games

Yuxi Ma; Meng Shen; Nan Zhang; Xiaoyao Tong; Yuanzhang Li

doi:10.1016/j.ins.2022.08.101

OM-TCN: A dynamic and agile opponent modeling approach for competitive games

Yuxi Ma, Meng Shen, Nan Zhang, Xiaoyao Tong, Yuanzhang Li^*

^*此作品的通讯作者

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

The non-stationarity of the environment is a crucial challenge for competitive Multi-Agent Reinforcement Learning (MARL) due to the constantly changing opponent policy. Existing schemes are challenging to make the protagonist agent that agilely responds to the opponent's changes and the resulting non-stationarity, which may inevitably limit their applicability. To address the dynamic opponent policy and adapt to the non-stationary environment continuously, we propose a Temporal Convolutional Network (TCN) model for modeling and predicting opponent behaviors called OM-TCN, and apply it to the widely-used Multi-Agent Deep Deterministic Policies Gradient (MADDPG) algorithm of competitive MARL. In this work, we collect the opponent's behavior data observed by the protagonist agent and serialize it in granularity of episodes. Then we input the time-series data into OM-TCN for sequence modeling. The OM-TCN learns the historical behaviors of the opponent instead of overfitting to a specific opponent policy, and can make predictions about the opponent's future actions. Finally, we use predictions of opponent actions in place of the history sampled from the playback buffer, and apply the OM-TCN model to the MADDPG framework for decentralized training. We use the competitive scenario of Multi-agent Particle Environment (MPE) to evaluate the proposed method. Simulation results show that the protagonist agent is able to learn more efficient and stable policy and converge easier than other baselines.

源语言	英语
页（从-至）	405-414
页数	10
期刊	Information Sciences
卷	615
DOI	https://doi.org/10.1016/j.ins.2022.08.101
出版状态	已出版 - 11月 2022

访问文件

10.1016/j.ins.2022.08.101

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{881a842e69c648948d8d1c0ad0b8a1ed,

title = "OM-TCN: A dynamic and agile opponent modeling approach for competitive games",

abstract = "The non-stationarity of the environment is a crucial challenge for competitive Multi-Agent Reinforcement Learning (MARL) due to the constantly changing opponent policy. Existing schemes are challenging to make the protagonist agent that agilely responds to the opponent's changes and the resulting non-stationarity, which may inevitably limit their applicability. To address the dynamic opponent policy and adapt to the non-stationary environment continuously, we propose a Temporal Convolutional Network (TCN) model for modeling and predicting opponent behaviors called OM-TCN, and apply it to the widely-used Multi-Agent Deep Deterministic Policies Gradient (MADDPG) algorithm of competitive MARL. In this work, we collect the opponent's behavior data observed by the protagonist agent and serialize it in granularity of episodes. Then we input the time-series data into OM-TCN for sequence modeling. The OM-TCN learns the historical behaviors of the opponent instead of overfitting to a specific opponent policy, and can make predictions about the opponent's future actions. Finally, we use predictions of opponent actions in place of the history sampled from the playback buffer, and apply the OM-TCN model to the MADDPG framework for decentralized training. We use the competitive scenario of Multi-agent Particle Environment (MPE) to evaluate the proposed method. Simulation results show that the protagonist agent is able to learn more efficient and stable policy and converge easier than other baselines.",

keywords = "Competitive game, Multi-agent system, Opponent modeling, Reinforcement learning, Temporal convolutional network",

author = "Yuxi Ma and Meng Shen and Nan Zhang and Xiaoyao Tong and Yuanzhang Li",

note = "Publisher Copyright: {\textcopyright} 2022",

year = "2022",

month = nov,

doi = "10.1016/j.ins.2022.08.101",

language = "English",

volume = "615",

pages = "405--414",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - OM-TCN

T2 - A dynamic and agile opponent modeling approach for competitive games

AU - Ma, Yuxi

AU - Shen, Meng

AU - Zhang, Nan

AU - Tong, Xiaoyao

AU - Li, Yuanzhang

PY - 2022/11

Y1 - 2022/11

N2 - The non-stationarity of the environment is a crucial challenge for competitive Multi-Agent Reinforcement Learning (MARL) due to the constantly changing opponent policy. Existing schemes are challenging to make the protagonist agent that agilely responds to the opponent's changes and the resulting non-stationarity, which may inevitably limit their applicability. To address the dynamic opponent policy and adapt to the non-stationary environment continuously, we propose a Temporal Convolutional Network (TCN) model for modeling and predicting opponent behaviors called OM-TCN, and apply it to the widely-used Multi-Agent Deep Deterministic Policies Gradient (MADDPG) algorithm of competitive MARL. In this work, we collect the opponent's behavior data observed by the protagonist agent and serialize it in granularity of episodes. Then we input the time-series data into OM-TCN for sequence modeling. The OM-TCN learns the historical behaviors of the opponent instead of overfitting to a specific opponent policy, and can make predictions about the opponent's future actions. Finally, we use predictions of opponent actions in place of the history sampled from the playback buffer, and apply the OM-TCN model to the MADDPG framework for decentralized training. We use the competitive scenario of Multi-agent Particle Environment (MPE) to evaluate the proposed method. Simulation results show that the protagonist agent is able to learn more efficient and stable policy and converge easier than other baselines.

AB - The non-stationarity of the environment is a crucial challenge for competitive Multi-Agent Reinforcement Learning (MARL) due to the constantly changing opponent policy. Existing schemes are challenging to make the protagonist agent that agilely responds to the opponent's changes and the resulting non-stationarity, which may inevitably limit their applicability. To address the dynamic opponent policy and adapt to the non-stationary environment continuously, we propose a Temporal Convolutional Network (TCN) model for modeling and predicting opponent behaviors called OM-TCN, and apply it to the widely-used Multi-Agent Deep Deterministic Policies Gradient (MADDPG) algorithm of competitive MARL. In this work, we collect the opponent's behavior data observed by the protagonist agent and serialize it in granularity of episodes. Then we input the time-series data into OM-TCN for sequence modeling. The OM-TCN learns the historical behaviors of the opponent instead of overfitting to a specific opponent policy, and can make predictions about the opponent's future actions. Finally, we use predictions of opponent actions in place of the history sampled from the playback buffer, and apply the OM-TCN model to the MADDPG framework for decentralized training. We use the competitive scenario of Multi-agent Particle Environment (MPE) to evaluate the proposed method. Simulation results show that the protagonist agent is able to learn more efficient and stable policy and converge easier than other baselines.

KW - Competitive game

KW - Multi-agent system

KW - Opponent modeling

KW - Reinforcement learning

KW - Temporal convolutional network

UR - http://www.scopus.com/inward/record.url?scp=85140075611&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2022.08.101

DO - 10.1016/j.ins.2022.08.101

M3 - Article

AN - SCOPUS:85140075611

SN - 0020-0255

VL - 615

SP - 405

EP - 414

JO - Information Sciences

JF - Information Sciences

ER -

OM-TCN: A dynamic and agile opponent modeling approach for competitive games

摘要

访问文件

其它文件与链接

指纹

引用此