Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition

Mingyi Sun, Weigang Cui, Yue Zhang, Shuyue Yu, Xiaofeng Liao, Bin Hu, Yang Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

20 Citations (Scopus)

Abstract

Facial expression recognition (FER) in the wild is a challenging task for affective computing in human-machine interaction fields. However, most of the existing methods fail to learn the most prominent regions of facial images by simple cross-entropy loss due to the imbalance problem commonly existing in FER datasets, which limits the robustness and interpretability of the model. In addition, these methods only capture local features of original images with multisize shallow convolution and ignore facial texture characteristics, leading to a suboptimal recognition performance. To address these issues, in this article, we propose a novel FER network, named the attention-rectified and texture-enhanced cross-attention transformer feature fusion network (AR-TE-CATFFNet). Specifically, an attention-rectified convolution block is first designed to assist multiple convolution heads to focus on the critical areas of human faces and improve the model generalization. Second, we investigate a texture enhancement block to capture texture features through local binary pattern and gray-level co-occurrence matrix, which solves the limitation of insufficient texture information. Finally, a cross-attention transformer feature fusion block is employed to deeply integrate red, green, blue (RGB) features and texture features globally, which is beneficial to boost the accuracy of recognition. Competitive experimental results on three public datasets validate the efficacy of the proposed method, indicating that our proposed method achieves superior classification performance of 89.50% on real-world affective faces database (RAF-DB) dataset, 65.66% on AffectNet dataset, and 74.84% on FER2013 dataset against the existing methods.

Original languageEnglish
Pages (from-to)11823-11832
Number of pages10
JournalIEEE Transactions on Industrial Informatics
Volume19
Issue number12
DOIs
Publication statusPublished - 1 Dec 2023
Externally publishedYes

Keywords

  • Convolutional neural network (CNN)
  • face detector
  • facial expression recognition (FER)
  • texture features
  • transformer

Fingerprint

Dive into the research topics of 'Attention-Rectified and Texture-Enhanced Cross-Attention Transformer Feature Fusion Network for Facial Expression Recognition'. Together they form a unique fingerprint.

Cite this