Interpreting multi-agent reinforcement learning decisions via key feature activation

  • Peizhang Li
  • , Qing Fei*
  • , Zhen Chen
  • , Zhongqi Sun
  • , Bo Wang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Deep Reinforcement Learning (DRL) techniques have demonstrated remarkable performance in autonomous decision-making for unmanned systems, yet their lack of interpretability poses a significant challenge to trust and confidence in real-world applications. Explainable Reinforcement Learning (XRL) is increasingly recognized as a critical technology for addressing this concern. However, existing XRL approaches have some limitations, such as dependence on external data for indirect interpretation and the requirement of additional information encoding for direct interpretation. To overcome these challenges, this work proposes the Key Feature Activation Interpretation (KFAI) network, which transparently vectorizes the agent's states and actions, forming interpretable linear combinations through routing between them. KFAI replaces the opaque decision pathways typical in DRL with an interpretable activation mapping between states and actions, allowing agents to directly output interpretable deterministic policies. Leveraging the KFAI, we develop the multi-agent key-feature reinforcement learning network for multi-agent systems, enabling reliable autonomous decision-making in unmanned swarms. To evaluate the effectiveness of the proposed networks, a comparative study of interpretable modifications for various continuous and discrete MARL algorithms was conducted in simulated environments. The experimental results demonstrate that the methods endowed with interpretability through this modification exhibit superior performance on decision-making tasks when compared to the original baseline networks. Furthermore, we evaluate the interpretability of the proposed method along multiple dimensions, including sparsity and faithfulness, and design a novel visualization technique that demonstrates how multi-agent decision-making processes can be explained in both offline and real-time settings.

Original languageEnglish
Article number133080
JournalNeurocomputing
Volume677
DOIs
Publication statusPublished - 7 May 2026
Externally publishedYes

Keywords

  • Deterministic policy
  • Explainable reinforcement learning
  • Key feature activation
  • Multi-agent systems

Fingerprint

Dive into the research topics of 'Interpreting multi-agent reinforcement learning decisions via key feature activation'. Together they form a unique fingerprint.

Cite this