Abstract
Multimodal sarcasm detection (MSD) requires predicting the sarcastic sentiment by understanding diverse modalities of data (e.g., text, image). Beyond the surface-level information conveyed in the post data, understanding the underlying deep-level knowledge-such as the background and intent behind the data-is crucial for understanding the sarcastic sentiment. However, previous works have often overlooked this aspect, limiting their potential to achieve superior performance. To tackle this challenge, we propose DeepMSD, a novel framework that generates supplemental deep-level knowledge to enhance the understanding of sarcastic content. Specifically, we first devise a Deep-level Knowledge Extraction Module that leverages large vision-language models to generate deep-level information behind the text-image pairs. Additionally, we devise a Cross-knowledge Graph Reasoning Module to model how humans use prior knowledge to identify sarcastic cues in multimodal posts. This module constructs cross-knowledge graphs that connect deep-level knowledge with surface-level knowledge. As such, it enables a more profound exploration of the cues underlying sarcasm. Experiments on the public MSD dataset demonstrate that our approach significantly surpasses previous state-of-the-art methods.
| Original language | English |
|---|---|
| Pages (from-to) | 6413-6423 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 35 |
| Issue number | 7 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
Keywords
- Multi-modal sarcasm detection
- cross-knowledge graph
- deep-level knowledge
- large vision-language models
Fingerprint
Dive into the research topics of 'DeepMSD: Advancing Multimodal Sarcasm Detection Through Knowledge-Augmented Graph Reasoning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver