A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition

Xiaoheng Zhang, Weigang Cui, Bin Hu, Yang Li

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Emotion recognition in conversation (ERC) based on multiple modalities has attracted enormous attention. However, most research simply concatenated multimodal representations, generally neglecting the impact of cross-modal correspondences and uncertain factors, and leading to the cross-modal misalignment problems. Furthermore, recent methods only considered simple contextual features, commonly ignoring semantic clues and resulting in an insufficient capture of the semantic consistency. To address these limitations, we propose a novel multi-level alignment and cross-modal unified semantic graph refinement network (MA-CMU-SGRNet) for ERC task. Specifically, a multi-level alignment (MA) is first designed to bridge the gap between acoustic and lexical modalities, which can effectively contrast both the instance-level and prototype-level relationships, separating the multimodal features in the latent space. Second, a cross-modal uncertainty-aware unification (CMU) is adopted to generate a unified representation in joint space considering the ambiguity of emotion. Finally, a dual-encoding semantic graph refinement network (SGRNet) is investigated, which includes a syntactic encoder to aggregate information from near neighbors and a semantic encoder to focus on useful semantically close neighbors. Extensive experiments on three multimodal public datasets show the effectiveness of our proposed method compared with the state-of-the-art methods, indicating its potential application in conversational emotion recognition. Implementation codes can be available at <uri>https://github.com/zxiaohen/MA-CMU-SGRNet</uri>.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalIEEE Transactions on Affective Computing
DOIs
Publication statusAccepted/In press - 2024
Externally publishedYes

Keywords

  • Context modeling
  • Emotion recognition
  • Emotion recognition
  • Self-supervised learning
  • Semantics
  • Syntactics
  • Task analysis
  • Uncertainty
  • cross-modal alignment
  • multimodal fusion
  • semantic refinement

Fingerprint

Dive into the research topics of 'A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition'. Together they form a unique fingerprint.

Cite this