Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Bei Pan; Kaoru Hirota; Zhiyang Jia; Linhui Zhao; Xiaoming Jin; Yaping Dai

doi:10.1007/s12652-021-03407-2

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Bei Pan, Kaoru Hirota, Zhiyang Jia^*, Linhui Zhao^*, Xiaoming Jin, Yaping Dai

^*此作品的通讯作者

自动化学院

科研成果: 期刊稿件 › 文章 › 同行评审

19 引用（Scopus）

摘要

Multimodal fusion-based emotion recognition has attracted increasing attention in affective computing because different modalities can achieve information complementation. One of the main challenges for reliable and effective model design is to define and extract appropriate emotional features from different modalities. In this paper, we present a novel multimodal emotion recognition framework to estimate categorical emotions, where visual and audio signals are utilized as multimodal input. The model learns neural appearance and key emotion frame using a statistical geometric method, which acts as a pre-processer for saving computation power. Discriminative emotion features expressed from visual and audio modalities are extracted through evolutionary optimization, and then fed to the optimized extreme learning machine (ELM) classifiers for unimodal emotion recognition. Finally, a decision-level fusion strategy is applied to integrate the results of predicted emotions by the different classifiers to enhance the overall performance. The effectiveness of the proposed method is demonstrated through three public datasets, i.e., the acted CK+ dataset, the acted Enterface05 dataset, and the spontaneous BAUM-1s dataset. An average recognition rate of 93.53% on CK+, 91.62% on Enterface05, and 60.77% on BAUM-1s are obtained. The emotion recognition results acquired by fusing visual and audio predicted emotions are superior to both recognition of unimodality and concatenation of individual features.

源语言	英语
页（从-至）	1903-1917
页数	15
期刊	Journal of Ambient Intelligence and Humanized Computing
卷	14
期	3
DOI	https://doi.org/10.1007/s12652-021-03407-2
出版状态	已出版 - 3月 2023

访问文件

10.1007/s12652-021-03407-2

其它文件与链接

链接到 Scopus 的出版物

引用此

Pan, B., Hirota, K., Jia, Z., Zhao, L., Jin, X., & Dai, Y. (2023). Multimodal emotion recognition based on feature selection and extreme learning machine in video clips. Journal of Ambient Intelligence and Humanized Computing, 14(3), 1903-1917. https://doi.org/10.1007/s12652-021-03407-2

@article{4e3bdd015f654361ac1bbfd28e26abe4,

title = "Multimodal emotion recognition based on feature selection and extreme learning machine in video clips",

abstract = "Multimodal fusion-based emotion recognition has attracted increasing attention in affective computing because different modalities can achieve information complementation. One of the main challenges for reliable and effective model design is to define and extract appropriate emotional features from different modalities. In this paper, we present a novel multimodal emotion recognition framework to estimate categorical emotions, where visual and audio signals are utilized as multimodal input. The model learns neural appearance and key emotion frame using a statistical geometric method, which acts as a pre-processer for saving computation power. Discriminative emotion features expressed from visual and audio modalities are extracted through evolutionary optimization, and then fed to the optimized extreme learning machine (ELM) classifiers for unimodal emotion recognition. Finally, a decision-level fusion strategy is applied to integrate the results of predicted emotions by the different classifiers to enhance the overall performance. The effectiveness of the proposed method is demonstrated through three public datasets, i.e., the acted CK+ dataset, the acted Enterface05 dataset, and the spontaneous BAUM-1s dataset. An average recognition rate of 93.53% on CK+, 91.62% on Enterface05, and 60.77% on BAUM-1s are obtained. The emotion recognition results acquired by fusing visual and audio predicted emotions are superior to both recognition of unimodality and concatenation of individual features.",

keywords = "Emotion recognition, Evolutionary optimization, Extreme learning machine, Feature selection, Multimodal fusion",

author = "Bei Pan and Kaoru Hirota and Zhiyang Jia and Linhui Zhao and Xiaoming Jin and Yaping Dai",

note = "Publisher Copyright: {\textcopyright} 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.",

year = "2023",

month = mar,

doi = "10.1007/s12652-021-03407-2",

language = "English",

volume = "14",

pages = "1903--1917",

journal = "Journal of Ambient Intelligence and Humanized Computing",

issn = "1868-5137",

publisher = "Springer Verlag",

number = "3",

}

TY - JOUR

T1 - Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

AU - Pan, Bei

AU - Hirota, Kaoru

AU - Jia, Zhiyang

AU - Zhao, Linhui

AU - Jin, Xiaoming

AU - Dai, Yaping

PY - 2023/3

Y1 - 2023/3

N2 - Multimodal fusion-based emotion recognition has attracted increasing attention in affective computing because different modalities can achieve information complementation. One of the main challenges for reliable and effective model design is to define and extract appropriate emotional features from different modalities. In this paper, we present a novel multimodal emotion recognition framework to estimate categorical emotions, where visual and audio signals are utilized as multimodal input. The model learns neural appearance and key emotion frame using a statistical geometric method, which acts as a pre-processer for saving computation power. Discriminative emotion features expressed from visual and audio modalities are extracted through evolutionary optimization, and then fed to the optimized extreme learning machine (ELM) classifiers for unimodal emotion recognition. Finally, a decision-level fusion strategy is applied to integrate the results of predicted emotions by the different classifiers to enhance the overall performance. The effectiveness of the proposed method is demonstrated through three public datasets, i.e., the acted CK+ dataset, the acted Enterface05 dataset, and the spontaneous BAUM-1s dataset. An average recognition rate of 93.53% on CK+, 91.62% on Enterface05, and 60.77% on BAUM-1s are obtained. The emotion recognition results acquired by fusing visual and audio predicted emotions are superior to both recognition of unimodality and concatenation of individual features.

AB - Multimodal fusion-based emotion recognition has attracted increasing attention in affective computing because different modalities can achieve information complementation. One of the main challenges for reliable and effective model design is to define and extract appropriate emotional features from different modalities. In this paper, we present a novel multimodal emotion recognition framework to estimate categorical emotions, where visual and audio signals are utilized as multimodal input. The model learns neural appearance and key emotion frame using a statistical geometric method, which acts as a pre-processer for saving computation power. Discriminative emotion features expressed from visual and audio modalities are extracted through evolutionary optimization, and then fed to the optimized extreme learning machine (ELM) classifiers for unimodal emotion recognition. Finally, a decision-level fusion strategy is applied to integrate the results of predicted emotions by the different classifiers to enhance the overall performance. The effectiveness of the proposed method is demonstrated through three public datasets, i.e., the acted CK+ dataset, the acted Enterface05 dataset, and the spontaneous BAUM-1s dataset. An average recognition rate of 93.53% on CK+, 91.62% on Enterface05, and 60.77% on BAUM-1s are obtained. The emotion recognition results acquired by fusing visual and audio predicted emotions are superior to both recognition of unimodality and concatenation of individual features.

KW - Emotion recognition

KW - Evolutionary optimization

KW - Extreme learning machine

KW - Feature selection

KW - Multimodal fusion

UR - http://www.scopus.com/inward/record.url?scp=85111486941&partnerID=8YFLogxK

U2 - 10.1007/s12652-021-03407-2

DO - 10.1007/s12652-021-03407-2

M3 - Article

AN - SCOPUS:85111486941

SN - 1868-5137

VL - 14

SP - 1903

EP - 1917

JO - Journal of Ambient Intelligence and Humanized Computing

JF - Journal of Ambient Intelligence and Humanized Computing

IS - 3

ER -

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

摘要

访问文件

其它文件与链接

指纹

引用此