TY - JOUR
T1 - Towards multimodal sarcasm detection via label-aware graph contrastive learning with back-translation augmentation
AU - Wei, Yiwei
AU - Duan, Maomao
AU - Zhou, Hengyang
AU - Jia, Zhiyang
AU - Gao, Zengwei
AU - Wang, Longbiao
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/9/27
Y1 - 2024/9/27
N2 - Multimodal sarcasm detection, as a sentiment analysis task, has witnessed great strides owing to the rapid development of multimodal machine learning. However, existing graph-based studies mainly focus on capturing the atomic-aware relations between textual and visual graphs within individual instances, neglecting label-aware connections between different instances. To address this limitation, we propose a novel Label-aware Graph Contrastive Learning (LGCL) method that detects ironic cues from a label-aware perspective of multimodal data. We first construct unimodal graphs for each instance and fuse them into graph semantic space, to obtain the multimodal graphs. Then, we introduce two label-aware graph contrastive losses: Label-aware Unimodal Contrastive Loss (LUCL) and Label-aware Multimodal Contrastive Loss (LMCL), to make the model aware of the shared ironic cues related to sentiment labels within multimodal graph representations. Additionally, we propose Back-translation Data Augmentation (BTrA) for both textual and visual data to enhance contrastive learning, where different back-translation schemes are designed to generate a larger number of positive and negative samples. Experimental results on two public datasets demonstrate our method achieves state-of-the-art (SOTA) compared to previous methods.
AB - Multimodal sarcasm detection, as a sentiment analysis task, has witnessed great strides owing to the rapid development of multimodal machine learning. However, existing graph-based studies mainly focus on capturing the atomic-aware relations between textual and visual graphs within individual instances, neglecting label-aware connections between different instances. To address this limitation, we propose a novel Label-aware Graph Contrastive Learning (LGCL) method that detects ironic cues from a label-aware perspective of multimodal data. We first construct unimodal graphs for each instance and fuse them into graph semantic space, to obtain the multimodal graphs. Then, we introduce two label-aware graph contrastive losses: Label-aware Unimodal Contrastive Loss (LUCL) and Label-aware Multimodal Contrastive Loss (LMCL), to make the model aware of the shared ironic cues related to sentiment labels within multimodal graph representations. Additionally, we propose Back-translation Data Augmentation (BTrA) for both textual and visual data to enhance contrastive learning, where different back-translation schemes are designed to generate a larger number of positive and negative samples. Experimental results on two public datasets demonstrate our method achieves state-of-the-art (SOTA) compared to previous methods.
KW - Back-translation augmentation
KW - Label-aware contrastive learning
KW - Multimodal sarcasm detection
UR - http://www.scopus.com/inward/record.url?scp=85198747624&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2024.112109
DO - 10.1016/j.knosys.2024.112109
M3 - Article
AN - SCOPUS:85198747624
SN - 0950-7051
VL - 300
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 112109
ER -