TY - JOUR
T1 - A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods
AU - Pan, Bei
AU - Hirota, Kaoru
AU - Jia, Zhiyang
AU - Dai, Yaping
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/12/7
Y1 - 2023/12/7
N2 - Affective computing is one of the most important research fields in modern human–computer interaction (HCI). The goal of affective computing is to study and develop the theories, methods, and systems that can recognize, explain, process, and simulate human emotions. As a branch of affective computing, emotion recognition aims to enlighten the machine/computer automatically analyzing human emotions, which has received increasing attention from researchers in various fields. Human beings generally observe and understand the emotional states of one person by integrating the perceived information from his/her facial expressions, voice tone, speech content, behavior, or physiological features. To imitate the emotion observation manner of humans, researchers have been devoted to constructing multimodal emotion recognition models by fusing information from two or more modalities. In this paper, we provide a comprehensive review of multimodal emotion recognition from the perspectives of multimodal datasets, data preprocessing, unimodal feature extraction, and multimodal information fusion methods in recent decades. Furthermore, challenges and future research directions of the topic are specified and discussed. The main motivations of this review are to conclude the recent emergence of abundant works on multimodal emotion recognition and to provide potential guidance to researchers in the related field for understanding the pipeline and mainstream approaches to multimodal emotion recognition.
AB - Affective computing is one of the most important research fields in modern human–computer interaction (HCI). The goal of affective computing is to study and develop the theories, methods, and systems that can recognize, explain, process, and simulate human emotions. As a branch of affective computing, emotion recognition aims to enlighten the machine/computer automatically analyzing human emotions, which has received increasing attention from researchers in various fields. Human beings generally observe and understand the emotional states of one person by integrating the perceived information from his/her facial expressions, voice tone, speech content, behavior, or physiological features. To imitate the emotion observation manner of humans, researchers have been devoted to constructing multimodal emotion recognition models by fusing information from two or more modalities. In this paper, we provide a comprehensive review of multimodal emotion recognition from the perspectives of multimodal datasets, data preprocessing, unimodal feature extraction, and multimodal information fusion methods in recent decades. Furthermore, challenges and future research directions of the topic are specified and discussed. The main motivations of this review are to conclude the recent emergence of abundant works on multimodal emotion recognition and to provide potential guidance to researchers in the related field for understanding the pipeline and mainstream approaches to multimodal emotion recognition.
KW - Classifier
KW - Emotion recognition
KW - Feature learning
KW - Multimodal information fusion
UR - http://www.scopus.com/inward/record.url?scp=85173580477&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2023.126866
DO - 10.1016/j.neucom.2023.126866
M3 - Short survey
AN - SCOPUS:85173580477
SN - 0925-2312
VL - 561
JO - Neurocomputing
JF - Neurocomputing
M1 - 126866
ER -