Skip to main navigation Skip to search Skip to main content

DeepMSD: Advancing Multimodal Sarcasm Detection Through Knowledge-Augmented Graph Reasoning

  • Yiwei Wei
  • , Hengyang Zhou
  • , Shaozu Yuan
  • , Meng Chen
  • , Haitao Shi
  • , Zhiyang Jia
  • , Longbiao Wang*
  • , Xiaodong He
  • *Corresponding author for this work
  • Tianjin University
  • China University of Petroleum-Beijing at Karamay
  • JD AI Research
  • Ltd.

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal sarcasm detection (MSD) requires predicting the sarcastic sentiment by understanding diverse modalities of data (e.g., text, image). Beyond the surface-level information conveyed in the post data, understanding the underlying deep-level knowledge-such as the background and intent behind the data-is crucial for understanding the sarcastic sentiment. However, previous works have often overlooked this aspect, limiting their potential to achieve superior performance. To tackle this challenge, we propose DeepMSD, a novel framework that generates supplemental deep-level knowledge to enhance the understanding of sarcastic content. Specifically, we first devise a Deep-level Knowledge Extraction Module that leverages large vision-language models to generate deep-level information behind the text-image pairs. Additionally, we devise a Cross-knowledge Graph Reasoning Module to model how humans use prior knowledge to identify sarcastic cues in multimodal posts. This module constructs cross-knowledge graphs that connect deep-level knowledge with surface-level knowledge. As such, it enables a more profound exploration of the cues underlying sarcasm. Experiments on the public MSD dataset demonstrate that our approach significantly surpasses previous state-of-the-art methods.

Original languageEnglish
Pages (from-to)6413-6423
Number of pages11
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number7
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Multi-modal sarcasm detection
  • cross-knowledge graph
  • deep-level knowledge
  • large vision-language models

Fingerprint

Dive into the research topics of 'DeepMSD: Advancing Multimodal Sarcasm Detection Through Knowledge-Augmented Graph Reasoning'. Together they form a unique fingerprint.

Cite this