DeepMSD: Advancing Multimodal Sarcasm Detection through Knowledge-augmented Graph Reasoning

Yiwei Wei, Hengyang Zhou, Shaozu Yuan, Meng Chen, Haitao Shi, Zhiyang Jia, Longbiao Wang*, Xiaodong He

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal sarcasm detection (MSD) requires predicting the sarcastic sentiment by understanding diverse modalities of data (e.g., text, image). Beyond the surface-level information conveyed in the post data, understanding the underlying deep-level knowledge-such as the background and intent behind the data-is crucial for understanding the sarcastic sentiment. However, previous works have often overlooked this aspect, limiting their potential to achieve superior performance. To tackle this challenge, we propose DeepMSD, a novel framework that generates supplemental deep-level knowledge to enhance the understanding of sarcastic content. Specifically, we first devise a Deep-level Knowledge Extraction Module that leverages large vision-language models to generate deep-level information behind the text-image pairs. Additionally, we devise a Cross-knowledge Graph Reasoning Module to model how humans use prior knowledge to identify sarcastic cues in multimodal posts. This module constructs cross-knowledge graphs that connect deep-level knowledge with surface-level knowledge. As such, it enables a more profound exploration of the cues underlying sarcasm. Experiments on the public MSD dataset demonstrate that our approach significantly surpasses previous state-of-the-art methods.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • cross-knowledge graph
  • deep-level knowledge
  • large vision-language models
  • multi-modal sarcasm detection

Fingerprint

Dive into the research topics of 'DeepMSD: Advancing Multimodal Sarcasm Detection through Knowledge-augmented Graph Reasoning'. Together they form a unique fingerprint.

Cite this