Towards multimodal sarcasm detection via label-aware graph contrastive learning with back-translation augmentation

Yiwei Wei, Maomao Duan, Hengyang Zhou, Zhiyang Jia, Zengwei Gao, Longbiao Wang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Multimodal sarcasm detection, as a sentiment analysis task, has witnessed great strides owing to the rapid development of multimodal machine learning. However, existing graph-based studies mainly focus on capturing the atomic-aware relations between textual and visual graphs within individual instances, neglecting label-aware connections between different instances. To address this limitation, we propose a novel Label-aware Graph Contrastive Learning (LGCL) method that detects ironic cues from a label-aware perspective of multimodal data. We first construct unimodal graphs for each instance and fuse them into graph semantic space, to obtain the multimodal graphs. Then, we introduce two label-aware graph contrastive losses: Label-aware Unimodal Contrastive Loss (LUCL) and Label-aware Multimodal Contrastive Loss (LMCL), to make the model aware of the shared ironic cues related to sentiment labels within multimodal graph representations. Additionally, we propose Back-translation Data Augmentation (BTrA) for both textual and visual data to enhance contrastive learning, where different back-translation schemes are designed to generate a larger number of positive and negative samples. Experimental results on two public datasets demonstrate our method achieves state-of-the-art (SOTA) compared to previous methods.

Original languageEnglish
Article number112109
JournalKnowledge-Based Systems
Volume300
DOIs
Publication statusPublished - 27 Sept 2024
Externally publishedYes

Keywords

  • Back-translation augmentation
  • Label-aware contrastive learning
  • Multimodal sarcasm detection

Fingerprint

Dive into the research topics of 'Towards multimodal sarcasm detection via label-aware graph contrastive learning with back-translation augmentation'. Together they form a unique fingerprint.

Cite this