Multi-clue fusion for emotion recognition in the wild

Jingwei Yan, Wenming Zheng*, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong, Ning Sun

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

36 Citations (Scopus)

Abstract

In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the finetuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.

Original languageEnglish
Title of host publicationICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
EditorsCatherine Pelachaud, Yukiko I. Nakano, Toyoaki Nishida, Carlos Busso, Louis-Philippe Morency, Elisabeth Andre
PublisherAssociation for Computing Machinery, Inc
Pages458-463
Number of pages6
ISBN (Electronic)9781450345569
DOIs
Publication statusPublished - 31 Oct 2016
Externally publishedYes
Event18th ACM International Conference on Multimodal Interaction, ICMI 2016 - Tokyo, Japan
Duration: 12 Nov 201616 Nov 2016

Publication series

NameICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

Conference

Conference18th ACM International Conference on Multimodal Interaction, ICMI 2016
Country/TerritoryJapan
CityTokyo
Period12/11/1616/11/16

Keywords

  • Afew
  • Convolutional neural network (CNN)
  • Emotion recognition in the wild
  • Multi-clue
  • Recurrent neural network (RNN)

Fingerprint

Dive into the research topics of 'Multi-clue fusion for emotion recognition in the wild'. Together they form a unique fingerprint.

Cite this