TY - JOUR
T1 - Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition
AU - Rao, Haocong
AU - Xu, Shihao
AU - Hu, Xiping
AU - Cheng, Jun
AU - Hu, Bin
N1 - Publisher Copyright:
© 2021 Elsevier Inc.
PY - 2021/8
Y1 - 2021/8
N2 - Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (“pattern-invariance”) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10–50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL).
AB - Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (“pattern-invariance”) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10–50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL).
KW - Contrastive learning
KW - Momentum LSTM
KW - Skeleton based action recognition
KW - Skeleton data augmentation
KW - Unsupervised deep learning
UR - http://www.scopus.com/inward/record.url?scp=85104623527&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2021.04.023
DO - 10.1016/j.ins.2021.04.023
M3 - Article
AN - SCOPUS:85104623527
SN - 0020-0255
VL - 569
SP - 90
EP - 109
JO - Information Sciences
JF - Information Sciences
ER -