OM-TCN: A dynamic and agile opponent modeling approach for competitive games

Yuxi Ma, Meng Shen, Nan Zhang, Xiaoyao Tong, Yuanzhang Li*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

2 引用 (Scopus)

摘要

The non-stationarity of the environment is a crucial challenge for competitive Multi-Agent Reinforcement Learning (MARL) due to the constantly changing opponent policy. Existing schemes are challenging to make the protagonist agent that agilely responds to the opponent's changes and the resulting non-stationarity, which may inevitably limit their applicability. To address the dynamic opponent policy and adapt to the non-stationary environment continuously, we propose a Temporal Convolutional Network (TCN) model for modeling and predicting opponent behaviors called OM-TCN, and apply it to the widely-used Multi-Agent Deep Deterministic Policies Gradient (MADDPG) algorithm of competitive MARL. In this work, we collect the opponent's behavior data observed by the protagonist agent and serialize it in granularity of episodes. Then we input the time-series data into OM-TCN for sequence modeling. The OM-TCN learns the historical behaviors of the opponent instead of overfitting to a specific opponent policy, and can make predictions about the opponent's future actions. Finally, we use predictions of opponent actions in place of the history sampled from the playback buffer, and apply the OM-TCN model to the MADDPG framework for decentralized training. We use the competitive scenario of Multi-agent Particle Environment (MPE) to evaluate the proposed method. Simulation results show that the protagonist agent is able to learn more efficient and stable policy and converge easier than other baselines.

源语言英语
页(从-至)405-414
页数10
期刊Information Sciences
615
DOI
出版状态已出版 - 11月 2022

指纹

探究 'OM-TCN: A dynamic and agile opponent modeling approach for competitive games' 的科研主题。它们共同构成独一无二的指纹。

引用此