Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition

Shihao Xu, Haocong Rao, Xiping Hu*, Jun Cheng*, Bin Hu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

26 Citations (Scopus)

Abstract

We focus on unsupervised representation learning for skeleton based action recognition. Existing unsupervised approaches usually learn action representations by motion prediction but they lack the ability to fully learn inherent semantic similarity. In this paper, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP) to address this challenge. Different from plain motion prediction, PCRP performs reverse motion prediction based on encoder-decoder structure to extract more discriminative temporal pattern, and derives action prototypes by clustering to explore the inherent action similarity within the action encoding. Specifically, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization (EM) task. PCRP iteratively runs (1) E-step as to determine the distribution of action prototypes by clustering action encoding from the encoder while estimating concentration around prototypes, and (2) M-step as optimizing the model by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype by contrastive learning and perform reverse motion prediction task. Besides, the sorting can also serve as a temporal task similar as reverse prediction in the proposed framework. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms main stream unsupervised methods and even achieves superior performance over many supervised methods. The codes are available at: https://github.com/LZUSIAT/PCRP.

Original languageEnglish
Pages (from-to)624-634
Number of pages11
JournalIEEE Transactions on Multimedia
Volume25
DOIs
Publication statusPublished - 2023

Keywords

  • Prototypical contrast
  • skeleton based action recognition
  • unsupervised learning

Fingerprint

Dive into the research topics of 'Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition'. Together they form a unique fingerprint.

Cite this