OM-TCN: A dynamic and agile opponent modeling approach for competitive games

Yuxi Ma, Meng Shen, Nan Zhang, Xiaoyao Tong, Yuanzhang Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

The non-stationarity of the environment is a crucial challenge for competitive Multi-Agent Reinforcement Learning (MARL) due to the constantly changing opponent policy. Existing schemes are challenging to make the protagonist agent that agilely responds to the opponent's changes and the resulting non-stationarity, which may inevitably limit their applicability. To address the dynamic opponent policy and adapt to the non-stationary environment continuously, we propose a Temporal Convolutional Network (TCN) model for modeling and predicting opponent behaviors called OM-TCN, and apply it to the widely-used Multi-Agent Deep Deterministic Policies Gradient (MADDPG) algorithm of competitive MARL. In this work, we collect the opponent's behavior data observed by the protagonist agent and serialize it in granularity of episodes. Then we input the time-series data into OM-TCN for sequence modeling. The OM-TCN learns the historical behaviors of the opponent instead of overfitting to a specific opponent policy, and can make predictions about the opponent's future actions. Finally, we use predictions of opponent actions in place of the history sampled from the playback buffer, and apply the OM-TCN model to the MADDPG framework for decentralized training. We use the competitive scenario of Multi-agent Particle Environment (MPE) to evaluate the proposed method. Simulation results show that the protagonist agent is able to learn more efficient and stable policy and converge easier than other baselines.

Original languageEnglish
Pages (from-to)405-414
Number of pages10
JournalInformation Sciences
Volume615
DOIs
Publication statusPublished - Nov 2022

Keywords

  • Competitive game
  • Multi-agent system
  • Opponent modeling
  • Reinforcement learning
  • Temporal convolutional network

Fingerprint

Dive into the research topics of 'OM-TCN: A dynamic and agile opponent modeling approach for competitive games'. Together they form a unique fingerprint.

Cite this