Multi-clue fusion for emotion recognition in the wild

Jingwei Yan; Wenming Zheng; Zhen Cui; Chuangao Tang; Tong Zhang; Yuan Zong; Ning Sun

doi:10.1145/2993148.2997630

Multi-clue fusion for emotion recognition in the wild

Jingwei Yan, Wenming Zheng^*, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong, Ning Sun

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

36 引用（Scopus）

摘要

In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the finetuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.

源语言	英语
主期刊名	ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction
编辑	Catherine Pelachaud, Yukiko I. Nakano, Toyoaki Nishida, Carlos Busso, Louis-Philippe Morency, Elisabeth Andre
出版商	Association for Computing Machinery, Inc
页	458-463
页数	6
ISBN（电子版）	9781450345569
DOI	https://doi.org/10.1145/2993148.2997630
出版状态	已出版 - 31 10月 2016
已对外发布	是
活动	18th ACM International Conference on Multimodal Interaction, ICMI 2016 - Tokyo, 日本期限: 12 11月 2016 → 16 11月 2016

出版系列

姓名	ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

会议

会议	18th ACM International Conference on Multimodal Interaction, ICMI 2016
国家/地区	日本
市	Tokyo
时期	12/11/16 → 16/11/16

访问文件

10.1145/2993148.2997630

其它文件与链接

链接到 Scopus 的出版物

引用此

Yan, J., Zheng, W., Cui, Z., Tang, C., Zhang, T., Zong, Y., & Sun, N. (2016). Multi-clue fusion for emotion recognition in the wild. 在 C. Pelachaud, Y. I. Nakano, T. Nishida, C. Busso, L.-P. Morency, & E. Andre (编辑), ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction (页码 458-463). (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction). Association for Computing Machinery, Inc. https://doi.org/10.1145/2993148.2997630

Yan, Jingwei ; Zheng, Wenming ; Cui, Zhen 等. / Multi-clue fusion for emotion recognition in the wild. ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. 编辑 / Catherine Pelachaud ; Yukiko I. Nakano ; Toyoaki Nishida ; Carlos Busso ; Louis-Philippe Morency ; Elisabeth Andre. Association for Computing Machinery, Inc, 2016. 页码 458-463 (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction).

@inproceedings{a539b89310ed46d8a26117777ed5b80b,

title = "Multi-clue fusion for emotion recognition in the wild",

abstract = "In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the finetuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.",

keywords = "Afew, Convolutional neural network (CNN), Emotion recognition in the wild, Multi-clue, Recurrent neural network (RNN)",

author = "Jingwei Yan and Wenming Zheng and Zhen Cui and Chuangao Tang and Tong Zhang and Yuan Zong and Ning Sun",

note = "Publisher Copyright: {\textcopyright} 2016 ACM.; 18th ACM International Conference on Multimodal Interaction, ICMI 2016 ; Conference date: 12-11-2016 Through 16-11-2016",

year = "2016",

month = oct,

day = "31",

doi = "10.1145/2993148.2997630",

language = "English",

series = "ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction",

publisher = "Association for Computing Machinery, Inc",

pages = "458--463",

editor = "Catherine Pelachaud and Nakano, {Yukiko I.} and Toyoaki Nishida and Carlos Busso and Louis-Philippe Morency and Elisabeth Andre",

booktitle = "ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction",

}

Yan, J, Zheng, W, Cui, Z, Tang, C, Zhang, T, Zong, Y & Sun, N 2016, Multi-clue fusion for emotion recognition in the wild. 在 C Pelachaud, YI Nakano, T Nishida, C Busso, L-P Morency & E Andre (编辑), ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction, Association for Computing Machinery, Inc, 页码 458-463, 18th ACM International Conference on Multimodal Interaction, ICMI 2016, Tokyo, 日本, 12/11/16. https://doi.org/10.1145/2993148.2997630

Multi-clue fusion for emotion recognition in the wild. / Yan, Jingwei; Zheng, Wenming; Cui, Zhen 等.
ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. 编辑 / Catherine Pelachaud; Yukiko I. Nakano; Toyoaki Nishida; Carlos Busso; Louis-Philippe Morency; Elisabeth Andre. Association for Computing Machinery, Inc, 2016. 页码 458-463 (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multi-clue fusion for emotion recognition in the wild

AU - Yan, Jingwei

AU - Zheng, Wenming

AU - Cui, Zhen

AU - Tang, Chuangao

AU - Zhang, Tong

AU - Zong, Yuan

AU - Sun, Ning

PY - 2016/10/31

Y1 - 2016/10/31

N2 - In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the finetuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.

AB - In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the finetuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.

KW - Afew

KW - Convolutional neural network (CNN)

KW - Emotion recognition in the wild

KW - Multi-clue

KW - Recurrent neural network (RNN)

UR - http://www.scopus.com/inward/record.url?scp=85016557815&partnerID=8YFLogxK

U2 - 10.1145/2993148.2997630

DO - 10.1145/2993148.2997630

M3 - Conference contribution

AN - SCOPUS:85016557815

T3 - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

SP - 458

EP - 463

BT - ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction

A2 - Pelachaud, Catherine

A2 - Nakano, Yukiko I.

A2 - Nishida, Toyoaki

A2 - Busso, Carlos

A2 - Morency, Louis-Philippe

A2 - Andre, Elisabeth

PB - Association for Computing Machinery, Inc

T2 - 18th ACM International Conference on Multimodal Interaction, ICMI 2016

Y2 - 12 November 2016 through 16 November 2016

ER -

Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y 等. Multi-clue fusion for emotion recognition in the wild. 在 Pelachaud C, Nakano YI, Nishida T, Busso C, Morency LP, Andre E, 编辑, ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction. Association for Computing Machinery, Inc. 2016. 页码 458-463. (ICMI 2016 - Proceedings of the 18th ACM International Conference on Multimodal Interaction). doi: 10.1145/2993148.2997630

Multi-clue fusion for emotion recognition in the wild

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此