Abstract
Multimodal Aspect-Based Sentiment Analysis (MABSA) is challenging in data-heterologous settings, where images provide only weak or noisy context for textual aspects. Existing methods based on unconditional fusion or generic MLLM captions often suffer from granularity mismatch, hallucination, and irrelevant visual noise. We propose MADSC (Multimodal Aspect-aware Description with Similarity and Calibration), which strengthens aspect-aware grounding by refining generic captions into aspect-centric descriptions. MADSC uses a dual-similarity estimator to align aspects with caption objects through CLIP-based semantic compatibility and box-mediated visual grounding, and employs confidence calibration to gate unreliable visual cues during decoding. Experiments on Twitter-2015 and Twitter-2017 demonstrate state-of-the-art results on MATE, MABSA, and JMASA, confirming the effectiveness of aspect-aware refinement and calibrated alignment.
| Original language | English |
|---|---|
| Article number | 113712 |
| Journal | Pattern Recognition |
| Volume | 179 |
| DOIs | |
| Publication status | Published - Nov 2026 |
| Externally published | Yes |
Keywords
- Aspect-aware descriptions
- Confidence calibration
- Cross-modal fusion
- Modality gating
- Multimodal Aspect-Based Sentiment Analysis
Fingerprint
Dive into the research topics of 'MADSC: Aspect-aware description and calibrated alignment for unified Multimodal Aspect-Based Sentiment Analysis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver