Multi-cue fusion for emotion recognition in the wild

Jingwei Yan, Wenming Zheng*, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

87 引用 (Scopus)

摘要

Emotion recognition has become a hot research topic in the past several years due to the large demand of this technology in many practical situations. One challenging task in this topic is to recognize emotion types in a given video clip collected in the wild. In order to solve this problem we propose a multi-cue fusion emotion recognition (MCFER) framework by modeling human emotions from three complementary cues, i.e., facial texture, facial landmark action and audio signal, and then fusing them together. To capture the dynamic change of facial texture we employ a cascaded convolutional neutral network (CNN) and bidirectional recurrent neutral network (BRNN) architecture where facial image from each frame is first fed into CNN to extract high-level texture feature, and then the feature sequence is traversed into BRNN to learn the changes within it. Facial landmark action models the movement of facial muscles explicitly. SVM and CNN are deployed to explore the emotion related patterns in it. Audio signal is also modeled with CNN by extracting low-level acoustic features from segmented clips and then stacking them as an image-like matrix. We fuse these models at both feature level and decision level to further boost the overall performance. Experimental results on two challenging databases demonstrate the effectiveness and superiority of our proposed MCFER framework.

源语言英语
页(从-至)27-35
页数9
期刊Neurocomputing
309
DOI
出版状态已出版 - 2 10月 2018
已对外发布

指纹

探究 'Multi-cue fusion for emotion recognition in the wild' 的科研主题。它们共同构成独一无二的指纹。

引用此