TY - JOUR
T1 - MD-STGCN
T2 - Dynamic topology and multi-scale temporal modeling for skeleton-based action recognition
AU - Zhu, Shiran
AU - Li, Ronghua
AU - Hu, Henan
N1 - Publisher Copyright:
© 2026
PY - 2026/10/5
Y1 - 2026/10/5
N2 - Skeleton-based action recognition is a cornerstone for complex activity analysis, yet current Graph Convolutional Network (GCN) paradigms are constrained by three bottlenecks: rigid, predefined spatial topologies that fail to adapt to dynamic motion variations; channel-wise topology homogenization that limits multi-level semantic expressiveness; and disjointed spatio-temporal modeling hampered by restricted temporal receptive fields. To address these gaps, we propose the Multi-scale Dynamic Topology Spatio-Temporal Graph Convolutional Network (MD-STGCN). MD-STGCN integrates two complementary modules: a Dynamic Topology Graph Convolution (DT-GC) module that employs a learnable channel-wise topology gate to produce channel-adaptive sub-topologies and adaptive edge weights for finer spatial reasoning, and a Multi-Scale Temporal Convolution (MS-TC) module that uses a lightweight multi-branch design with depthwise-separable temporal convolutions to efficiently capture both short- and long-range temporal dependencies. Extensive experiments on NTU RGB+D and NTU RGB+D 120 demonstrate that MD-STGCN achieves state-of-the-art performance: the joint stream achieves a Top-1 accuracy of 99.0% on NTU cross-view, yielding a 1.6% absolute gain over STGCN++. Moreover, MD-STGCN requires 0.33M parameters and 0.45 GFLOPs, representing a substantial reduction in model complexity compared with CTR-GCN and MS-G3D, thereby offering a superior accuracy–efficiency trade-off. MD-STGCN is a promising candidate for resource-constrained scenarios, particularly in real-time surveillance and human–computer interaction systems.
AB - Skeleton-based action recognition is a cornerstone for complex activity analysis, yet current Graph Convolutional Network (GCN) paradigms are constrained by three bottlenecks: rigid, predefined spatial topologies that fail to adapt to dynamic motion variations; channel-wise topology homogenization that limits multi-level semantic expressiveness; and disjointed spatio-temporal modeling hampered by restricted temporal receptive fields. To address these gaps, we propose the Multi-scale Dynamic Topology Spatio-Temporal Graph Convolutional Network (MD-STGCN). MD-STGCN integrates two complementary modules: a Dynamic Topology Graph Convolution (DT-GC) module that employs a learnable channel-wise topology gate to produce channel-adaptive sub-topologies and adaptive edge weights for finer spatial reasoning, and a Multi-Scale Temporal Convolution (MS-TC) module that uses a lightweight multi-branch design with depthwise-separable temporal convolutions to efficiently capture both short- and long-range temporal dependencies. Extensive experiments on NTU RGB+D and NTU RGB+D 120 demonstrate that MD-STGCN achieves state-of-the-art performance: the joint stream achieves a Top-1 accuracy of 99.0% on NTU cross-view, yielding a 1.6% absolute gain over STGCN++. Moreover, MD-STGCN requires 0.33M parameters and 0.45 GFLOPs, representing a substantial reduction in model complexity compared with CTR-GCN and MS-G3D, thereby offering a superior accuracy–efficiency trade-off. MD-STGCN is a promising candidate for resource-constrained scenarios, particularly in real-time surveillance and human–computer interaction systems.
KW - Dynamic topology
KW - Graph convolutional network
KW - Multi-scale temporal convolution
KW - Skeleton-based action recognition
KW - Spatio-temporal feature learning
UR - https://www.scopus.com/pages/publications/105037971566
U2 - 10.1016/j.ins.2026.123572
DO - 10.1016/j.ins.2026.123572
M3 - Article
AN - SCOPUS:105037971566
SN - 0020-0255
VL - 752
JO - Information Sciences
JF - Information Sciences
M1 - 123572
ER -