Aspect-level multimodal sentiment analysis via object-attention

Chaojie Zhu, Yuming Yan, Baochang Chu, Gang Li, Heyan Huang, Xiaoyan Gao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Aspect-level multimodal sentiment analysis (ALMSA) aims to identify the sentiment polarity of a specific aspect word using both sentence and image data. Current models often rely on the global features of images, overlooking the details in the original image. To address this issue, we propose an object attention-based aspect-level multimodal sentiment analysis model (OAB-ALMSA). This model first employs an object detection algorithm to capture the detailed information of the objects from the original image. It then applies an object-attention mechanism and builds an iterative fusion layer to fully fuse the multimodal information. Finally, a curriculum learning strategy is developed to tackle the challenges of training with complex samples. Experiments conducted on TWITTER-2015 data sets demonstrate that OAB-ALMSA, when combined with curriculum learning, achieves the highest F1. These results highlight that leveraging detailed image data enhances the model’s overall understanding and improves prediction accuracy.

Original languageEnglish
Pages (from-to)1562-1572
Number of pages11
JournalCAAI Transactions on Intelligent Systems
Volume19
Issue number6
DOIs
Publication statusPublished - 2024
Externally publishedYes

Keywords

  • aspect-level sentiment analysis
  • deep learning
  • feature extraction
  • multimodal
  • natural language processing systems
  • object detection
  • self-attention
  • sentiment analysis

Fingerprint

Dive into the research topics of 'Aspect-level multimodal sentiment analysis via object-attention'. Together they form a unique fingerprint.

Cite this