Skip to main navigation Skip to search Skip to main content

One for All: A Unified Generative Framework for Image Emotion Classification

  • Ge Shi
  • , Sinuo Deng
  • , Bo Wang
  • , Chong Feng*
  • , Yan Zhuang
  • , Xiaomei Wang
  • *Corresponding author for this work
  • Beijing University of Technology
  • Beijing Zhongke Huili Technology Company Ltd.
  • Beijing Institute of Technology
  • General Hospital of People's Liberation Army
  • CAS - Institutes of Science and Development

Research output: Contribution to journalArticlepeer-review

Abstract

Image Emotion Classification (IEC) is an essential research area, offering valuable insights into user emotional states for a wide range of applications, including opinion mining, recommendation systems, and mental health treatment. The challenges associated with IEC are mainly attributed to the complexity and ambiguity of human emotions, the lack of a universally accepted emotion model, and excessive dependence on prior knowledge. To address these challenges, we propose a novel Unified Generative framework for Image Emotion Classification (UGRIE), which is capable of simultaneously modeling various emotion models and capturing intricate semantic relationships between emotion labels. Our approach employs a flexible natural language template, converting the IEC task into a template-filling process that can be easily adapted to accommodate a diverse range of IEC tasks. To further enhance the performance, we devise a mapping mechanism to seamlessly integrate the multimodal pre-training model CLIP with the text generation pre-training model BART, thus leveraging the strengths of both models. A comprehensive set of experiments conducted on multiple public datasets demonstrates that our proposed method consistently outperforms existing approaches to a large margin in supervised settings, exhibits remarkable performance in low-resource scenarios, and unifies distinct emotion models within a single, versatile framework.

Original languageEnglish
Pages (from-to)7057-7068
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume34
Issue number8
DOIs
Publication statusPublished - 2024

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • Pre-training model
  • images emotion classification
  • multi-modal learning

Fingerprint

Dive into the research topics of 'One for All: A Unified Generative Framework for Image Emotion Classification'. Together they form a unique fingerprint.

Cite this