Two-Stage Self-Supervised Learning for Facial Action Unit Recognition

Hao Cheng, Xiang Xie, Shuang Liang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a two-stage self-supervised method for facial action unit recognition. First, an auto-encoder approach is applied, with an encoder which operates on a small proportion e.g., 40% of images patches. The decoder reconstructs the original image from latent features and learnable mask tokens. After training, the encoder is adapted to the task of AU recognition, yet poor results are observed in certain AU classes. To address the problem, contrastive learning is proposed to learn discriminative features. This method uses images from the VGG-Face2 dataset, which vary in terms of head pose, age and background. Experiments on AU recognition show that the two-stage method strengthens the representation quality. Compared to previous self-supervised methods, the pre-trained encoder achieves the best linear probing result on DISFA dataset, with the F1-score of 53.8%. Fine-tuning experiment is also conducted, and obtains the F1-score of 59.9%, with a roughly 3% gap to existing state-of-the-art method. The two-stage training method is easy to implement and expandable for further research.

Original languageEnglish
Title of host publicationIVSP 2022 - 2022 4th International Conference on Image, Video and Signal Processing
PublisherAssociation for Computing Machinery
Pages80-84
Number of pages5
ISBN (Electronic)9781450387415
DOIs
Publication statusPublished - 18 Mar 2022
Event4th International Conference on Image, Video and Signal Processing, IVSP 2022 - Virtual, Online, Singapore
Duration: 18 Mar 202220 Mar 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference4th International Conference on Image, Video and Signal Processing, IVSP 2022
Country/TerritorySingapore
CityVirtual, Online
Period18/03/2220/03/22

Keywords

  • Facial action unit recognition
  • Self-supervised learning
  • Vision Transformers

Fingerprint

Dive into the research topics of 'Two-Stage Self-Supervised Learning for Facial Action Unit Recognition'. Together they form a unique fingerprint.

Cite this