Abstract
Using sarcasm on social media platforms to express negative opinions towards a person or object has become increasingly common. However, detecting sarcasm in various forms of communication can be difficult due to conflicting sentiments. In this paper, we introduce a contrasting sentiment-based model for multimodal sarcasm detection (CS4MSD), which identifies inconsistent emotions by leveraging the CLIP knowledge module to produce sentiment features in both text and image. Then, five external sentiments are introduced to prompt the model learning sentimental preferences among modalities. Furthermore, we highlight the importance of verbal descriptions embedded in illustrations and incorporate additional knowledge-sharing modules to fuse such image-like features. Experimental results demonstrate that our model achieves state-of-the-art performance on the public multimodal sarcasm dataset.
Original language | English |
---|---|
Pages (from-to) | 375-386 |
Number of pages | 12 |
Journal | CAAI Transactions on Intelligence Technology |
Volume | 10 |
Issue number | 2 |
DOIs | |
Publication status | Published - Apr 2025 |
Externally published | Yes |
Keywords
- CLIP
- image-text classification
- knowledge fusion
- multi-modal
- sarcasm detection