Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition

Haocong Rao; Shihao Xu; Xiping Hu; Jun Cheng; Bin Hu

doi:10.1016/j.ins.2021.04.023

Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition

Haocong Rao, Shihao Xu, Xiping Hu^*, Jun Cheng, Bin Hu

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

137 引用（Scopus）

摘要

Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (“pattern-invariance”) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10–50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL).

源语言	英语
页（从-至）	90-109
页数	20
期刊	Information Sciences
卷	569
DOI	https://doi.org/10.1016/j.ins.2021.04.023
出版状态	已出版 - 8月 2021
已对外发布	是

访问文件

10.1016/j.ins.2021.04.023

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{6417ed75856b4449902849685d3286ee,

title = "Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition",

abstract = "Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (“pattern-invariance”) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10–50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL).",

keywords = "Contrastive learning, Momentum LSTM, Skeleton based action recognition, Skeleton data augmentation, Unsupervised deep learning",

author = "Haocong Rao and Shihao Xu and Xiping Hu and Jun Cheng and Bin Hu",

note = "Publisher Copyright: {\textcopyright} 2021 Elsevier Inc.",

year = "2021",

month = aug,

doi = "10.1016/j.ins.2021.04.023",

language = "English",

volume = "569",

pages = "90--109",

journal = "Information Sciences",

issn = "0020-0255",

publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition

AU - Rao, Haocong

AU - Xu, Shihao

AU - Hu, Xiping

AU - Cheng, Jun

AU - Hu, Bin

PY - 2021/8

Y1 - 2021/8

N2 - Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (“pattern-invariance”) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10–50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL).

AB - Action recognition via 3D skeleton data is an emerging important topic. Most existing methods rely on hand-crafted descriptors to recognize actions, or perform supervised action representation learning with massive labels. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances of the input skeleton sequence, which are transformed with multiple novel augmentation strategies, to learn inherent action patterns (“pattern-invariance”) in different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning. Last, we propose a novel representation named Contrastive Action Encoding (CAE) to represent human's action effectively. Empirical evaluations show that our approach significantly outperforms hand-crafted methods by 10–50% Top-1 accuracy, and it can even achieve superior performance to many supervised learning methods (Our codes are available athttps://github.com/Mikexu007/AS-CAL).

KW - Contrastive learning

KW - Momentum LSTM

KW - Skeleton based action recognition

KW - Skeleton data augmentation

KW - Unsupervised deep learning

UR - http://www.scopus.com/inward/record.url?scp=85104623527&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2021.04.023

DO - 10.1016/j.ins.2021.04.023

M3 - Article

AN - SCOPUS:85104623527

SN - 0020-0255

VL - 569

SP - 90

EP - 109

JO - Information Sciences

JF - Information Sciences

ER -

Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition

摘要

访问文件

其它文件与链接

指纹

引用此