Abstract
Aspect-level multimodal sentiment analysis (ALMSA) aims to identify the sentiment polarity of a specific aspect word using both sentence and image data. Current models often rely on the global features of images, overlooking the details in the original image. To address this issue, we propose an object attention-based aspect-level multimodal sentiment analysis model (OAB-ALMSA). This model first employs an object detection algorithm to capture the detailed information of the objects from the original image. It then applies an object-attention mechanism and builds an iterative fusion layer to fully fuse the multimodal information. Finally, a curriculum learning strategy is developed to tackle the challenges of training with complex samples. Experiments conducted on TWITTER-2015 data sets demonstrate that OAB-ALMSA, when combined with curriculum learning, achieves the highest F1. These results highlight that leveraging detailed image data enhances the model’s overall understanding and improves prediction accuracy.
Original language | English |
---|---|
Pages (from-to) | 1562-1572 |
Number of pages | 11 |
Journal | CAAI Transactions on Intelligent Systems |
Volume | 19 |
Issue number | 6 |
DOIs | |
Publication status | Published - 2024 |
Externally published | Yes |
Keywords
- aspect-level sentiment analysis
- deep learning
- feature extraction
- multimodal
- natural language processing systems
- object detection
- self-attention
- sentiment analysis