A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods

Bei Pan; Kaoru Hirota; Zhiyang Jia; Yaping Dai

doi:10.1016/j.neucom.2023.126866

A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods

Bei Pan, Kaoru Hirota, Zhiyang Jia^*, Yaping Dai

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Short survey › peer-review

25 Citations (Scopus)

Abstract

Affective computing is one of the most important research fields in modern human–computer interaction (HCI). The goal of affective computing is to study and develop the theories, methods, and systems that can recognize, explain, process, and simulate human emotions. As a branch of affective computing, emotion recognition aims to enlighten the machine/computer automatically analyzing human emotions, which has received increasing attention from researchers in various fields. Human beings generally observe and understand the emotional states of one person by integrating the perceived information from his/her facial expressions, voice tone, speech content, behavior, or physiological features. To imitate the emotion observation manner of humans, researchers have been devoted to constructing multimodal emotion recognition models by fusing information from two or more modalities. In this paper, we provide a comprehensive review of multimodal emotion recognition from the perspectives of multimodal datasets, data preprocessing, unimodal feature extraction, and multimodal information fusion methods in recent decades. Furthermore, challenges and future research directions of the topic are specified and discussed. The main motivations of this review are to conclude the recent emergence of abundant works on multimodal emotion recognition and to provide potential guidance to researchers in the related field for understanding the pipeline and mainstream approaches to multimodal emotion recognition.

Original language	English
Article number	126866
Journal	Neurocomputing
Volume	561
DOIs	https://doi.org/10.1016/j.neucom.2023.126866
Publication status	Published - 7 Dec 2023

Keywords

Classifier
Emotion recognition
Feature learning
Multimodal information fusion

Access to Document

10.1016/j.neucom.2023.126866

Cite this

@article{2bbb4997ea674272b6daf169452fd422,

title = "A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods",

abstract = "Affective computing is one of the most important research fields in modern human–computer interaction (HCI). The goal of affective computing is to study and develop the theories, methods, and systems that can recognize, explain, process, and simulate human emotions. As a branch of affective computing, emotion recognition aims to enlighten the machine/computer automatically analyzing human emotions, which has received increasing attention from researchers in various fields. Human beings generally observe and understand the emotional states of one person by integrating the perceived information from his/her facial expressions, voice tone, speech content, behavior, or physiological features. To imitate the emotion observation manner of humans, researchers have been devoted to constructing multimodal emotion recognition models by fusing information from two or more modalities. In this paper, we provide a comprehensive review of multimodal emotion recognition from the perspectives of multimodal datasets, data preprocessing, unimodal feature extraction, and multimodal information fusion methods in recent decades. Furthermore, challenges and future research directions of the topic are specified and discussed. The main motivations of this review are to conclude the recent emergence of abundant works on multimodal emotion recognition and to provide potential guidance to researchers in the related field for understanding the pipeline and mainstream approaches to multimodal emotion recognition.",

keywords = "Classifier, Emotion recognition, Feature learning, Multimodal information fusion",

author = "Bei Pan and Kaoru Hirota and Zhiyang Jia and Yaping Dai",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2023",

month = dec,

day = "7",

doi = "10.1016/j.neucom.2023.126866",

language = "English",

volume = "561",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods

AU - Pan, Bei

AU - Hirota, Kaoru

AU - Jia, Zhiyang

AU - Dai, Yaping

PY - 2023/12/7

Y1 - 2023/12/7

N2 - Affective computing is one of the most important research fields in modern human–computer interaction (HCI). The goal of affective computing is to study and develop the theories, methods, and systems that can recognize, explain, process, and simulate human emotions. As a branch of affective computing, emotion recognition aims to enlighten the machine/computer automatically analyzing human emotions, which has received increasing attention from researchers in various fields. Human beings generally observe and understand the emotional states of one person by integrating the perceived information from his/her facial expressions, voice tone, speech content, behavior, or physiological features. To imitate the emotion observation manner of humans, researchers have been devoted to constructing multimodal emotion recognition models by fusing information from two or more modalities. In this paper, we provide a comprehensive review of multimodal emotion recognition from the perspectives of multimodal datasets, data preprocessing, unimodal feature extraction, and multimodal information fusion methods in recent decades. Furthermore, challenges and future research directions of the topic are specified and discussed. The main motivations of this review are to conclude the recent emergence of abundant works on multimodal emotion recognition and to provide potential guidance to researchers in the related field for understanding the pipeline and mainstream approaches to multimodal emotion recognition.

AB - Affective computing is one of the most important research fields in modern human–computer interaction (HCI). The goal of affective computing is to study and develop the theories, methods, and systems that can recognize, explain, process, and simulate human emotions. As a branch of affective computing, emotion recognition aims to enlighten the machine/computer automatically analyzing human emotions, which has received increasing attention from researchers in various fields. Human beings generally observe and understand the emotional states of one person by integrating the perceived information from his/her facial expressions, voice tone, speech content, behavior, or physiological features. To imitate the emotion observation manner of humans, researchers have been devoted to constructing multimodal emotion recognition models by fusing information from two or more modalities. In this paper, we provide a comprehensive review of multimodal emotion recognition from the perspectives of multimodal datasets, data preprocessing, unimodal feature extraction, and multimodal information fusion methods in recent decades. Furthermore, challenges and future research directions of the topic are specified and discussed. The main motivations of this review are to conclude the recent emergence of abundant works on multimodal emotion recognition and to provide potential guidance to researchers in the related field for understanding the pipeline and mainstream approaches to multimodal emotion recognition.

KW - Classifier

KW - Emotion recognition

KW - Feature learning

KW - Multimodal information fusion

UR - http://www.scopus.com/inward/record.url?scp=85173580477&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2023.126866

DO - 10.1016/j.neucom.2023.126866

M3 - Short survey

AN - SCOPUS:85173580477

SN - 0925-2312

VL - 561

JO - Neurocomputing

JF - Neurocomputing

M1 - 126866

ER -

A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this