TY - JOUR
T1 - Multi-Level Attention Map Network for Multimodal Sentiment Analysis
AU - Xue, Xiaojun
AU - Zhang, Chunxia
AU - Niu, Zhendong
AU - Wu, Xindong
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - Multimodal sentiment analysis (MSA) is a very challenging task due to its complex and complementary interactions between multiple modalities, which can be widely applied into areas of product marketing, public opinion monitoring, and so on. However, previous works directly utilized the features extracted from multimodal data, in which the noise reduction within and among multiple modalities has been largely ignored before multimodal fusion. This paper proposes a multi-level attention map network (MAMN) to filter noise before multimodal fusion and capture the consistent and heterogeneous correlations among multi-granularity features for multimodal sentiment analysis. Architecturally, MAMN is comprised of three modules: multi-granularity feature extraction module, multi-level attention map generation module, and attention map fusion module. The first module is designed to sufficiently extract multi-granularity features from multimodal data. The second module is constructed to filter noise and enhance the representation ability for multi-granularity features before multimodal fusion. And the third module is built to extensibly mine the interactions among multi-level attention maps by the proposed extensible co-attention fusion method. Extensive experimental results on three public datasets show the proposed model is significantly superior to the state-of-the-art methods, and demonstrate its effectiveness on two tasks of document-based and aspect-based MSA tasks.
AB - Multimodal sentiment analysis (MSA) is a very challenging task due to its complex and complementary interactions between multiple modalities, which can be widely applied into areas of product marketing, public opinion monitoring, and so on. However, previous works directly utilized the features extracted from multimodal data, in which the noise reduction within and among multiple modalities has been largely ignored before multimodal fusion. This paper proposes a multi-level attention map network (MAMN) to filter noise before multimodal fusion and capture the consistent and heterogeneous correlations among multi-granularity features for multimodal sentiment analysis. Architecturally, MAMN is comprised of three modules: multi-granularity feature extraction module, multi-level attention map generation module, and attention map fusion module. The first module is designed to sufficiently extract multi-granularity features from multimodal data. The second module is constructed to filter noise and enhance the representation ability for multi-granularity features before multimodal fusion. And the third module is built to extensibly mine the interactions among multi-level attention maps by the proposed extensible co-attention fusion method. Extensive experimental results on three public datasets show the proposed model is significantly superior to the state-of-the-art methods, and demonstrate its effectiveness on two tasks of document-based and aspect-based MSA tasks.
KW - Multimodal sentiment analysis
KW - multimodal fusion
KW - opinion mining
KW - social analysis
UR - http://www.scopus.com/inward/record.url?scp=85125706917&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2022.3155290
DO - 10.1109/TKDE.2022.3155290
M3 - Article
AN - SCOPUS:85125706917
SN - 1041-4347
VL - 35
SP - 5105
EP - 5118
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 5
ER -