TY - JOUR
T1 - Adaptive key-frame selection-based facial expression recognition via multi-cue dynamic features hybrid fusion
AU - Pan, Bei
AU - Hirota, Kaoru
AU - Dai, Yaping
AU - Jia, Zhiyang
AU - Fukushima, Edwardo F.
AU - She, Jinhua
N1 - Publisher Copyright:
© 2024 Elsevier Inc.
PY - 2024/3
Y1 - 2024/3
N2 - A multi-cue dynamic features hybrid fusion (MDF-HF) method for video-based facial expression recognition is presented. It is composed of key-frame selection, multi-cue dynamic feature extraction, and information fusion components. An adaptive key-frame selection strategy is first designed in the training procedure to extract pivotal facial images from video sequences, addressing the challenge of imbalanced data distribution and improving data quality. The similarity threshold used for key-frame selection is automatically adjusted based on the number of image frames in each expression category, creating a flexible frame processing procedure. Multi-cue spatio-temporal feature descriptors are then designed to acquire diverse dynamic feature representations from the selected key-frame sequences. With parallel computation, different levels of semantic information are extracted simultaneously to explore facial expression deformation in video clips. To integrate features from multiple cues, a weighted stacking ensemble strategy is devised, preserving unique feature characteristics while exploring interrelationships among the multi-cue features. The proposed method is evaluated on three benchmark datasets: eNTERFACE'05, BAUM-1s, and AFEW, achieving average accuracies of 59.7%, 57.5%, and 54.7%, respectively. The MDF-HF method exhibits superior performance, compared to state-of-the-art methods in facial expression recognition, offering a robust solution for recognizing facial expressions in dynamic and unconstrained video scenarios.
AB - A multi-cue dynamic features hybrid fusion (MDF-HF) method for video-based facial expression recognition is presented. It is composed of key-frame selection, multi-cue dynamic feature extraction, and information fusion components. An adaptive key-frame selection strategy is first designed in the training procedure to extract pivotal facial images from video sequences, addressing the challenge of imbalanced data distribution and improving data quality. The similarity threshold used for key-frame selection is automatically adjusted based on the number of image frames in each expression category, creating a flexible frame processing procedure. Multi-cue spatio-temporal feature descriptors are then designed to acquire diverse dynamic feature representations from the selected key-frame sequences. With parallel computation, different levels of semantic information are extracted simultaneously to explore facial expression deformation in video clips. To integrate features from multiple cues, a weighted stacking ensemble strategy is devised, preserving unique feature characteristics while exploring interrelationships among the multi-cue features. The proposed method is evaluated on three benchmark datasets: eNTERFACE'05, BAUM-1s, and AFEW, achieving average accuracies of 59.7%, 57.5%, and 54.7%, respectively. The MDF-HF method exhibits superior performance, compared to state-of-the-art methods in facial expression recognition, offering a robust solution for recognizing facial expressions in dynamic and unconstrained video scenarios.
KW - Dynamic feature learning
KW - Facial expression recognition
KW - Key-frame selection
KW - Multi-cue information fusion
KW - Stacking ensemble
UR - http://www.scopus.com/inward/record.url?scp=85182729583&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2024.120138
DO - 10.1016/j.ins.2024.120138
M3 - Article
AN - SCOPUS:85182729583
SN - 0020-0255
VL - 660
JO - Information Sciences
JF - Information Sciences
M1 - 120138
ER -