BrainyHGNN: Brain-Inspired Memory Retrieval and Cross-modal Interaction for Emotion Recognition in Conversations

Qixin Wang, Ziyu Li, Xiuxing Li, Tianyuan Jia, Qing Li, Li Yao, Xia Wu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Research on emotion recognition in conversations emphasises the importance of complex relationships between conversational context and multimodality. Graph-based methods, particularly hypergraph-based methods have shown promise in capturing these relationships. However, challenges persist in avoiding redundant context while capturing essential information for optimal context embeddings and fully leveraging cross-modal complementarities for sufficient fusion. In contrast, the human brain flexibly retrieves relevant memories and integrates multi-modal data for accurate recognition. Based on this superiority, we propose BrainyHGNN, a brain-inspired hypergraph neural network. It integrates a Dynamic Memory Selector for contextual hyperedges, mimicking selective memory retrieval mechanisms for adaptive and modality-specific context retrieval. HierSensNet is designed for multi-modal hyperedges, mirroring hierarchical cross-modal interaction mechanisms to ensure effective multimodal fusion. Experimental results on two benchmark datasets validate the superior performance of BrainyHGNN, confirming the effectiveness of its innovative approach. This work highlights the potential of brain-inspired methods to advance flexible context retrieval and sufficient multimodal fusion, presenting a promising direction for future research in this domain.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Brain-inspired methods
  • emotion recognition in conversations
  • hypergraph neural networks
  • multimodal fusion

Fingerprint

Dive into the research topics of 'BrainyHGNN: Brain-Inspired Memory Retrieval and Cross-modal Interaction for Emotion Recognition in Conversations'. Together they form a unique fingerprint.

Cite this