TY - JOUR
T1 - HC-CoT
T2 - A hierarchical causal chain-of-thought framework for multimodal sarcasm detection
AU - Zhao, Tianyu
AU - Zhu, Junlong
AU - Meng, Ling Ang
AU - Song, Dawei
N1 - Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/8/1
Y1 - 2026/8/1
N2 - Multimodal sarcasm detection relies on identifying semantic incongruity between visual and textual modalities. However, existing methods typically model incongruity through monolithic feature fusion or shallow interactions, neglecting the hierarchical structure of sarcasm, which manifests distinctively at entity, attribute, and scene levels. Consequently, these models often rely on spurious correlations rather than genuine causal dependencies, resulting in limited robustness against modality imbalance and distribution shifts. To address these challenges, we propose the Hierarchical Causal Chain-of-Thought (HC-CoT) framework, where Chain-of-Thought refers to a structured inference trace over the hierarchical latent variables of our H-SCM (rather than LLM-style natural-language rationales), a hierarchical causal reasoning framework that models sarcasm with a three-level Hierarchical Structural Causal Model (H-SCM) whose bottom-up causal structure (Entity → Attribute → Scene) is learned under explicit sparsity and acyclicity constraints. Over this SCM, HC-CoT performs bidirectional inference: bottom-up evidence aggregation forms scene hypotheses, while top-down contextual refinement re-evaluates lower-level states without introducing reverse causal edges. Training combines supervised learning with (i) missing-modality consistency regularization and (ii) counterfactual augmentation with explicit label policies, improving robustness without relying on heuristic shortcut cues. Extensive experiments on the MMSD and MMSD2.0 benchmarks demonstrate that HC-CoT achieves new state-of-the-art performance, exhibiting significant gains in accuracy, robustness, and interpretability.
AB - Multimodal sarcasm detection relies on identifying semantic incongruity between visual and textual modalities. However, existing methods typically model incongruity through monolithic feature fusion or shallow interactions, neglecting the hierarchical structure of sarcasm, which manifests distinctively at entity, attribute, and scene levels. Consequently, these models often rely on spurious correlations rather than genuine causal dependencies, resulting in limited robustness against modality imbalance and distribution shifts. To address these challenges, we propose the Hierarchical Causal Chain-of-Thought (HC-CoT) framework, where Chain-of-Thought refers to a structured inference trace over the hierarchical latent variables of our H-SCM (rather than LLM-style natural-language rationales), a hierarchical causal reasoning framework that models sarcasm with a three-level Hierarchical Structural Causal Model (H-SCM) whose bottom-up causal structure (Entity → Attribute → Scene) is learned under explicit sparsity and acyclicity constraints. Over this SCM, HC-CoT performs bidirectional inference: bottom-up evidence aggregation forms scene hypotheses, while top-down contextual refinement re-evaluates lower-level states without introducing reverse causal edges. Training combines supervised learning with (i) missing-modality consistency regularization and (ii) counterfactual augmentation with explicit label policies, improving robustness without relying on heuristic shortcut cues. Extensive experiments on the MMSD and MMSD2.0 benchmarks demonstrate that HC-CoT achieves new state-of-the-art performance, exhibiting significant gains in accuracy, robustness, and interpretability.
KW - Bidirectional inference
KW - Causal graphs
KW - Hierarchical causal learning
KW - Multimodal sarcasm detection
KW - Structural causal model
UR - https://www.scopus.com/pages/publications/105035262932
U2 - 10.1016/j.eswa.2026.132291
DO - 10.1016/j.eswa.2026.132291
M3 - Article
AN - SCOPUS:105035262932
SN - 0957-4174
VL - 322
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 132291
ER -