TY - JOUR
T1 - Deep spatial and channel sliding attention patches for pose-invariant facial expression recognition
AU - Fan, Xiaoyu
AU - Liu, Chaoji
AU - Li, Shuangxi
AU - Yao, Jie
AU - Han, Xinfeng
AU - Liu, Xingqiao
AU - Chen, Chong
N1 - Publisher Copyright:
© 2026
PY - 2026/6
Y1 - 2026/6
N2 - Pose-invariant facial expression recognition (FER) is an import yet challenging research topics in computer vision, especially with the introduction of pose change and self-occlusion, which makes the recognition results changing from one observational angle to another. In this paper, we propose a sliding patch combined with spatial and channel attention network (SPA-SE) for pose-invariant FER. The proposed network comprises three main components: a slide patch (SP) model, a spatial-level patch attention (SPA) model, and a channel-level attention (squeeze- and-extraction) model. The slide patch (SP) model is designed to determine the optimal patch size and stride, reducing the impact of pose variation on recognition accuracy. The spatial-level patch attention (SPA) model guides the network to focus on regional features and adaptively assigns weights to indicate the importance of the local patch. The channel-level attention model is embedded into the bottleneck block to provide more salient feature maps for the SPA model. To evaluate the effectiveness of the SPA-SE network, we conducted experiments on five pose-invariant FER datasets. These include three controllable FER datasets (BU3DFEP1, BU3DFEP2, and Multi-PIE) that achieved accuracies of 78.01%, 81.65%, and 86.77%, respectively, as well as two real-world FER datasets (Pose-RAFDB and Pose-Affect) that achieved accuracies of 86.76% (>30°) and 85.92% (>45°), and 59.84% (>30°) and 60.36% (>45°), respectively. The results demonstrate that our method can effectively improve the recognition accuracy in practical applications.
AB - Pose-invariant facial expression recognition (FER) is an import yet challenging research topics in computer vision, especially with the introduction of pose change and self-occlusion, which makes the recognition results changing from one observational angle to another. In this paper, we propose a sliding patch combined with spatial and channel attention network (SPA-SE) for pose-invariant FER. The proposed network comprises three main components: a slide patch (SP) model, a spatial-level patch attention (SPA) model, and a channel-level attention (squeeze- and-extraction) model. The slide patch (SP) model is designed to determine the optimal patch size and stride, reducing the impact of pose variation on recognition accuracy. The spatial-level patch attention (SPA) model guides the network to focus on regional features and adaptively assigns weights to indicate the importance of the local patch. The channel-level attention model is embedded into the bottleneck block to provide more salient feature maps for the SPA model. To evaluate the effectiveness of the SPA-SE network, we conducted experiments on five pose-invariant FER datasets. These include three controllable FER datasets (BU3DFEP1, BU3DFEP2, and Multi-PIE) that achieved accuracies of 78.01%, 81.65%, and 86.77%, respectively, as well as two real-world FER datasets (Pose-RAFDB and Pose-Affect) that achieved accuracies of 86.76% (>30°) and 85.92% (>45°), and 59.84% (>30°) and 60.36% (>45°), respectively. The results demonstrate that our method can effectively improve the recognition accuracy in practical applications.
KW - Channel-level attention model
KW - Deep convolutional neural network
KW - Pose-invariant FER
KW - Sliding patch model
KW - Spatial-level attention model
UR - https://www.scopus.com/pages/publications/105034746780
U2 - 10.1016/j.gmod.2026.101327
DO - 10.1016/j.gmod.2026.101327
M3 - Article
AN - SCOPUS:105034746780
SN - 1524-0703
VL - 145
JO - Graphical Models
JF - Graphical Models
M1 - 101327
ER -