A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

Haocong Rao; Siqi Wang; Xiping Hu; Mingkui Tan; Yi Guo; Jun Cheng; Xinwang Liu; Bin Hu

doi:10.1109/TPAMI.2021.3092833

A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Yi Guo, Jun Cheng, Xinwang Liu, Bin Hu

School of Medical and Technology

Research output: Contribution to journal › Article › peer-review

49 Citations (Scopus)

Abstract

Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information.

Original language	English
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
DOIs	https://doi.org/10.1109/TPAMI.2021.3092833
Publication status	Accepted/In press - 2021

Keywords

Computational modeling
Contrastive Learning
Encoding
Feature extraction
Gait
Locality-Aware Attention
Self-Supervised Deep Learning
Skeleton
Skeleton Based Person Re-Identification
Solid modeling
Task analysis
Three-dimensional displays

Access to Document

10.1109/TPAMI.2021.3092833

Cite this

@article{9da86755a0d64d88bf927f4ada31e9ac,

title = "A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification",

abstract = "Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information.",

keywords = "Computational modeling, Contrastive Learning, Encoding, Feature extraction, Gait, Locality-Aware Attention, Self-Supervised Deep Learning, Skeleton, Skeleton Based Person Re-Identification, Solid modeling, Task analysis, Three-dimensional displays",

author = "Haocong Rao and Siqi Wang and Xiping Hu and Mingkui Tan and Yi Guo and Jun Cheng and Xinwang Liu and Bin Hu",

note = "Publisher Copyright: IEEE",

year = "2021",

doi = "10.1109/TPAMI.2021.3092833",

language = "English",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

AU - Rao, Haocong

AU - Wang, Siqi

AU - Hu, Xiping

AU - Tan, Mingkui

AU - Guo, Yi

AU - Cheng, Jun

AU - Liu, Xinwang

AU - Hu, Bin

N1 - Publisher Copyright: IEEE

PY - 2021

Y1 - 2021

N2 - Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information.

AB - Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information.

KW - Computational modeling

KW - Contrastive Learning

KW - Encoding

KW - Feature extraction

KW - Gait

KW - Locality-Aware Attention

KW - Self-Supervised Deep Learning

KW - Skeleton

KW - Skeleton Based Person Re-Identification

KW - Solid modeling

KW - Task analysis

KW - Three-dimensional displays

UR - http://www.scopus.com/inward/record.url?scp=85112166817&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2021.3092833

DO - 10.1109/TPAMI.2021.3092833

M3 - Article

C2 - 34181534

AN - SCOPUS:85112166817

SN - 0162-8828

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

ER -

A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this