TY - JOUR
T1 - Dual-stream structured graph convolution network for skeleton-based action recognition
AU - Xu, Chunyan
AU - Liu, Rong
AU - Zhang, Tong
AU - Cui, Zhen
AU - Yang, Jian
AU - Hu, Chunlong
N1 - Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/11
Y1 - 2021/11
N2 - In this work, we propose a dual-stream structured graph convolution network (DS-SGCN) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the graph convolution learning process on both the video and skeleton modalities. To effectively represent the skeletal graph of discrete joints, we create a structured graph convolution module specifically designed to encode partitioned body parts along with their dynamic interactions in the spatio-temporal sequence. In more detail, we build a set of structured intra-part graphs, each of which can be adopted to represent a distinctive body part (e.g., left arm, right leg, head). The inter-part graph is then constructed to model the dynamic interactions across different body parts; here each node corresponds to an intra-part graph built above, while an edge between two nodes is used to express these internal relationships of human movement. We implement the graph convolution learning on both intra- and inter-part graphs in order to obtain the inherent characteristics and dynamic interactions, respectively, of human action. After integrating the intra- and inter-levels of spatial context/coordinate cues, a convolution filtering process is conducted on time slices to capture these temporal dynamics of human motion. Finally, we fuse two streams of graph convolution responses in order to predict the category information of human action in an end-to-end fashion. Comprehensive experiments on five single/multi-modal benchmark datasets (including NTU RGB+D 60, NTU RGB+D 120, MSR-Daily 3D, N-UCLA, and HDM05) demonstrate that the proposed DS-SGCN framework achieves encouraging performance on the skeleton-based action recognition task.
AB - In this work, we propose a dual-stream structured graph convolution network (DS-SGCN) to solve the skeleton-based action recognition problem. The spatio-temporal coordinates and appearance contexts of the skeletal joints are jointly integrated into the graph convolution learning process on both the video and skeleton modalities. To effectively represent the skeletal graph of discrete joints, we create a structured graph convolution module specifically designed to encode partitioned body parts along with their dynamic interactions in the spatio-temporal sequence. In more detail, we build a set of structured intra-part graphs, each of which can be adopted to represent a distinctive body part (e.g., left arm, right leg, head). The inter-part graph is then constructed to model the dynamic interactions across different body parts; here each node corresponds to an intra-part graph built above, while an edge between two nodes is used to express these internal relationships of human movement. We implement the graph convolution learning on both intra- and inter-part graphs in order to obtain the inherent characteristics and dynamic interactions, respectively, of human action. After integrating the intra- and inter-levels of spatial context/coordinate cues, a convolution filtering process is conducted on time slices to capture these temporal dynamics of human motion. Finally, we fuse two streams of graph convolution responses in order to predict the category information of human action in an end-to-end fashion. Comprehensive experiments on five single/multi-modal benchmark datasets (including NTU RGB+D 60, NTU RGB+D 120, MSR-Daily 3D, N-UCLA, and HDM05) demonstrate that the proposed DS-SGCN framework achieves encouraging performance on the skeleton-based action recognition task.
KW - Action recognition
KW - Dual-stream structured graph convolution
KW - Graph convolution network
UR - http://www.scopus.com/inward/record.url?scp=85123316823&partnerID=8YFLogxK
U2 - 10.1145/3450410
DO - 10.1145/3450410
M3 - Article
AN - SCOPUS:85123316823
SN - 1551-6857
VL - 17
JO - ACM Transactions on Multimedia Computing, Communications and Applications
JF - ACM Transactions on Multimedia Computing, Communications and Applications
IS - 4
M1 - 3450410
ER -