TY - JOUR
T1 - Prototypical Contrast and Reverse Prediction
T2 - Unsupervised Skeleton Based Action Recognition
AU - Xu, Shihao
AU - Rao, Haocong
AU - Hu, Xiping
AU - Cheng, Jun
AU - Hu, Bin
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - We focus on unsupervised representation learning for skeleton based action recognition. Existing unsupervised approaches usually learn action representations by motion prediction but they lack the ability to fully learn inherent semantic similarity. In this paper, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP) to address this challenge. Different from plain motion prediction, PCRP performs reverse motion prediction based on encoder-decoder structure to extract more discriminative temporal pattern, and derives action prototypes by clustering to explore the inherent action similarity within the action encoding. Specifically, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization (EM) task. PCRP iteratively runs (1) E-step as to determine the distribution of action prototypes by clustering action encoding from the encoder while estimating concentration around prototypes, and (2) M-step as optimizing the model by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype by contrastive learning and perform reverse motion prediction task. Besides, the sorting can also serve as a temporal task similar as reverse prediction in the proposed framework. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms main stream unsupervised methods and even achieves superior performance over many supervised methods. The codes are available at: https://github.com/LZUSIAT/PCRP.
AB - We focus on unsupervised representation learning for skeleton based action recognition. Existing unsupervised approaches usually learn action representations by motion prediction but they lack the ability to fully learn inherent semantic similarity. In this paper, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP) to address this challenge. Different from plain motion prediction, PCRP performs reverse motion prediction based on encoder-decoder structure to extract more discriminative temporal pattern, and derives action prototypes by clustering to explore the inherent action similarity within the action encoding. Specifically, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization (EM) task. PCRP iteratively runs (1) E-step as to determine the distribution of action prototypes by clustering action encoding from the encoder while estimating concentration around prototypes, and (2) M-step as optimizing the model by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype by contrastive learning and perform reverse motion prediction task. Besides, the sorting can also serve as a temporal task similar as reverse prediction in the proposed framework. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms main stream unsupervised methods and even achieves superior performance over many supervised methods. The codes are available at: https://github.com/LZUSIAT/PCRP.
KW - Prototypical contrast
KW - skeleton based action recognition
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85120049688&partnerID=8YFLogxK
U2 - 10.1109/TMM.2021.3129616
DO - 10.1109/TMM.2021.3129616
M3 - Article
AN - SCOPUS:85120049688
SN - 1520-9210
VL - 25
SP - 624
EP - 634
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -