Multimodal Emotion Recognition Based on Multi-Scale Facial Features and Cross-Modal Attention

Chengao Bao, Luefeng Chen*, Min Li, Min Wu, Witold Pedrycz, Kaoru Hirota

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

A multi-modal emotion recognition method based on facial multi-scale features and cross-modal attention (MS-FCA) network is proposed. The MSFCA model improves the traditional single-branch ViT network into a two-branch ViT architecture by using classification tokens in each branch to interact with picture embeddings in the other branch, which facilitates effective interactions between different scales of information. Subsequently, audio features are extracted using ResNet18 network. The cross-modal attention mechanism is used to obtain the weight matrices between different modal features, making full use of inter-modal correlation and effectively fusing visual and audio features for more accurate emotion recognition. Two datasets are used for the experiments: eNTERFACE'05 and REDVESS dataset. The experimental results show that the accuracy of the proposed method on the eNTERFACE'05 and REDVESS datasets is 85.42% and 83.84% respectively, which proves the effectiveness of the proposed method.

Original languageEnglish
Title of host publication2025 International Conference on Industrial Technology, ICIT 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331521950
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event26th International Conference on Industrial Technology, ICIT 2025 - Wuhan, China
Duration: 26 Mar 202528 Mar 2025

Publication series

NameProceedings of the IEEE International Conference on Industrial Technology
ISSN (Print)2641-0184
ISSN (Electronic)2643-2978

Conference

Conference26th International Conference on Industrial Technology, ICIT 2025
Country/TerritoryChina
CityWuhan
Period26/03/2528/03/25

Keywords

  • cross-modal attention
  • Multi-scale features
  • multimodal emotion recognition

Fingerprint

Dive into the research topics of 'Multimodal Emotion Recognition Based on Multi-Scale Facial Features and Cross-Modal Attention'. Together they form a unique fingerprint.

Cite this