MULTI-MODAL EMOTION RECOGNITION USING SITUATION-BASED VIDEO CONTEXT EMOTION DATASET

  • Guiping Lu
  • , Honghua Liu
  • , Kejun Wang
  • , Weidong Hu
  • , Wenliang Peng
  • , Tao Yang
  • , Shan Lu

Research output: Contribution to journalArticlepeer-review

Abstract

Current multi-modal emotion recognition techniques primarily use modalities such as expression, speech, text, and gesture. Existing methods only capture emotion from the current moment in a picture or video, neglecting the influence of time and past experiences on human emotion. Expanding the temporal scope can provide more clues for emotion recognition. To address this, constructed the Situation-Based Video Context Emotion Datasets (SVCEmotion) dataset in video form. Experiments show that both VGGish and BERTbase achieve good results on SVCEmotion. Comparison with other audio emotion recognition methods proves that VGGish is more suitable for audio emotion feature extraction on the dataset constructed in this paper. Comparison experiments with textual descriptions demonstrate that the contextual descriptions introduced in the SVCEmotion dataset for the emotion recognition task under wide time range can provide clues for emotion recognition, and that the combination with factual descriptions can substantially improve the emotion recognition effect.

Original languageEnglish
Pages (from-to)1123-1143
Number of pages21
JournalComputing and Informatics
Volume44
Issue number5
DOIs
Publication statusPublished - 2025

Keywords

  • Multi-modal fusion
  • dataset
  • deep learning
  • emotion recognition
  • transfer learning

Fingerprint

Dive into the research topics of 'MULTI-MODAL EMOTION RECOGNITION USING SITUATION-BASED VIDEO CONTEXT EMOTION DATASET'. Together they form a unique fingerprint.

Cite this