TY - JOUR
T1 - One for All
T2 - A Unified Generative Framework for Image Emotion Classification
AU - Shi, Ge
AU - Deng, Sinuo
AU - Wang, Bo
AU - Feng, Chong
AU - Zhuang, Yan
AU - Wang, Xiaomei
N1 - Publisher Copyright:
IEEE
PY - 2023
Y1 - 2023
N2 - Image Emotion Classification (IEC) is an essential research area, offering valuable insights into user emotional states for a wide range of applications, including opinion mining, recommendation systems, and mental health treatment. The challenges associated with IEC are mainly attributed to the complexity and ambiguity of human emotions, the lack of a universally accepted emotion model, and excessive dependence on prior knowledge. To address these challenges, we propose a novel Unified Generative framework for Image Emotion Classification (UGRIE), which is capable of simultaneously modeling various emotion models and capturing intricate semantic relationships between emotion labels. Our approach employs a flexible natural language template, converting the IEC task into a template-filling process that can be easily adapted to accommodate a diverse range of IEC tasks. To further enhance the performance, we devise a mapping mechanism to seamlessly integrate the multimodal pre-training model CLIP with the text generation pre-training model BART, thus leveraging the strengths of both models. A comprehensive set of experiments conducted on multiple public datasets demonstrates that our proposed method consistently outperforms existing approaches to a large margin in supervised settings, exhibits remarkable performance in low-resource scenarios, and unifies distinct emotion models within a single, versatile framework.
AB - Image Emotion Classification (IEC) is an essential research area, offering valuable insights into user emotional states for a wide range of applications, including opinion mining, recommendation systems, and mental health treatment. The challenges associated with IEC are mainly attributed to the complexity and ambiguity of human emotions, the lack of a universally accepted emotion model, and excessive dependence on prior knowledge. To address these challenges, we propose a novel Unified Generative framework for Image Emotion Classification (UGRIE), which is capable of simultaneously modeling various emotion models and capturing intricate semantic relationships between emotion labels. Our approach employs a flexible natural language template, converting the IEC task into a template-filling process that can be easily adapted to accommodate a diverse range of IEC tasks. To further enhance the performance, we devise a mapping mechanism to seamlessly integrate the multimodal pre-training model CLIP with the text generation pre-training model BART, thus leveraging the strengths of both models. A comprehensive set of experiments conducted on multiple public datasets demonstrates that our proposed method consistently outperforms existing approaches to a large margin in supervised settings, exhibits remarkable performance in low-resource scenarios, and unifies distinct emotion models within a single, versatile framework.
KW - Adaptation models
KW - Data models
KW - Emotion recognition
KW - IEC
KW - Pre-training model
KW - Psychology
KW - Semantics
KW - Task analysis
KW - images emotion classification
KW - multi-modal learning
UR - http://www.scopus.com/inward/record.url?scp=85180284158&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2023.3341840
DO - 10.1109/TCSVT.2023.3341840
M3 - Article
AN - SCOPUS:85180284158
SN - 1051-8215
SP - 1
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -