TY - JOUR
T1 - Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition
AU - Sun, Mingyi
AU - Cui, Weigang
AU - Zhang, Yue
AU - Yu, Shuyue
AU - Liao, Xiaofeng
AU - Hu, Bin
AU - Li, Yang
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2023/12/1
Y1 - 2023/12/1
N2 - Facial expression recognition (FER) in the wild is a challenging task for affective computing in human-machine interaction fields. However, most of the existing methods fail to learn the most prominent regions of facial images by simple cross-entropy loss due to the imbalance problem commonly existing in FER datasets, which limits the robustness and interpretability of the model. In addition, these methods only capture local features of original images with multisize shallow convolution and ignore facial texture characteristics, leading to a suboptimal recognition performance. To address these issues, in this article, we propose a novel FER network, named the attention-rectified and texture-enhanced cross-attention transformer feature fusion network (AR-TE-CATFFNet). Specifically, an attention-rectified convolution block is first designed to assist multiple convolution heads to focus on the critical areas of human faces and improve the model generalization. Second, we investigate a texture enhancement block to capture texture features through local binary pattern and gray-level co-occurrence matrix, which solves the limitation of insufficient texture information. Finally, a cross-attention transformer feature fusion block is employed to deeply integrate red, green, blue (RGB) features and texture features globally, which is beneficial to boost the accuracy of recognition. Competitive experimental results on three public datasets validate the efficacy of the proposed method, indicating that our proposed method achieves superior classification performance of 89.50% on real-world affective faces database (RAF-DB) dataset, 65.66% on AffectNet dataset, and 74.84% on FER2013 dataset against the existing methods.
AB - Facial expression recognition (FER) in the wild is a challenging task for affective computing in human-machine interaction fields. However, most of the existing methods fail to learn the most prominent regions of facial images by simple cross-entropy loss due to the imbalance problem commonly existing in FER datasets, which limits the robustness and interpretability of the model. In addition, these methods only capture local features of original images with multisize shallow convolution and ignore facial texture characteristics, leading to a suboptimal recognition performance. To address these issues, in this article, we propose a novel FER network, named the attention-rectified and texture-enhanced cross-attention transformer feature fusion network (AR-TE-CATFFNet). Specifically, an attention-rectified convolution block is first designed to assist multiple convolution heads to focus on the critical areas of human faces and improve the model generalization. Second, we investigate a texture enhancement block to capture texture features through local binary pattern and gray-level co-occurrence matrix, which solves the limitation of insufficient texture information. Finally, a cross-attention transformer feature fusion block is employed to deeply integrate red, green, blue (RGB) features and texture features globally, which is beneficial to boost the accuracy of recognition. Competitive experimental results on three public datasets validate the efficacy of the proposed method, indicating that our proposed method achieves superior classification performance of 89.50% on real-world affective faces database (RAF-DB) dataset, 65.66% on AffectNet dataset, and 74.84% on FER2013 dataset against the existing methods.
KW - Convolutional neural network (CNN)
KW - face detector
KW - facial expression recognition (FER)
KW - texture features
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85149876909&partnerID=8YFLogxK
U2 - 10.1109/TII.2023.3253188
DO - 10.1109/TII.2023.3253188
M3 - Article
AN - SCOPUS:85149876909
SN - 1551-3203
VL - 19
SP - 11823
EP - 11832
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 12
ER -