Which is more faithful, seeing or saying? Multimodal sarcasm detection exploiting contrasting sentiment knowledge

Yutao Chen, Shumin Shi*, Heyan Huang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Using sarcasm on social media platforms to express negative opinions towards a person or object has become increasingly common. However, detecting sarcasm in various forms of communication can be difficult due to conflicting sentiments. In this paper, we introduce a contrasting sentiment-based model for multimodal sarcasm detection (CS4MSD), which identifies inconsistent emotions by leveraging the CLIP knowledge module to produce sentiment features in both text and image. Then, five external sentiments are introduced to prompt the model learning sentimental preferences among modalities. Furthermore, we highlight the importance of verbal descriptions embedded in illustrations and incorporate additional knowledge-sharing modules to fuse such image-like features. Experimental results demonstrate that our model achieves state-of-the-art performance on the public multimodal sarcasm dataset.

Original languageEnglish
Pages (from-to)375-386
Number of pages12
JournalCAAI Transactions on Intelligence Technology
Volume10
Issue number2
DOIs
Publication statusPublished - Apr 2025
Externally publishedYes

Keywords

  • CLIP
  • image-text classification
  • knowledge fusion
  • multi-modal
  • sarcasm detection

Fingerprint

Dive into the research topics of 'Which is more faithful, seeing or saying? Multimodal sarcasm detection exploiting contrasting sentiment knowledge'. Together they form a unique fingerprint.

Cite this