Patch attention convolutional vision transformer for facial expression recognition with occlusion

Chang Liu, Kaoru Hirota, Yaping Dai*

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

56 引用 (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 57
  • Captures
    • Readers: 33
see details

摘要

Despite substantial progress in Facial Expression Recognition (FER) in recent decades, most previous methods have been developed to recognize constrained facial expressions. Real-world occlusions lead to invisible facial regions and contaminated facial features, which undoubtedly increase the difficulty of FER in the wild. Therefore, a Patch Attention Convolutional Vision Transformer (PACVT) is proposed to tackle the occlusion FER problem. The backbone convolutional neural network is used to extract facial feature maps, which are cropped into multiple regional patches to extract local and global features. The Patch Attention Unit (PAU) is designed to perceive occluded regions by adaptively calculating the patch-level attention weights of local features for expression recognition. The facial patches are mapped into sequences of visual tokens, and the Vision Transformer (ViT) is employed to capture the interactions and correlations between these visual tokens from a global perspective. The self-attention in ViT enables the PACVT to focus on the salient patches with discriminative features and ignore the occlusion. Experiments are conducted on three widely used expression datasets and their occlusion subsets, and the results demonstrate that the proposed PACVT outperforms state-of-the-art methods on occlusion FER. Cross-dataset experiment results evidence the generalization ability of the PACVT.

源语言英语
页(从-至)781-794
页数14
期刊Information Sciences
619
DOI
出版状态已出版 - 1月 2023

指纹

探究 'Patch attention convolutional vision transformer for facial expression recognition with occlusion' 的科研主题。它们共同构成独一无二的指纹。

引用此

Liu, C., Hirota, K., & Dai, Y. (2023). Patch attention convolutional vision transformer for facial expression recognition with occlusion. Information Sciences, 619, 781-794. https://doi.org/10.1016/j.ins.2022.11.068