Abstract
Skeleton-based action recognition is a cornerstone for complex activity analysis, yet current Graph Convolutional Network (GCN) paradigms are constrained by three bottlenecks: rigid, predefined spatial topologies that fail to adapt to dynamic motion variations; channel-wise topology homogenization that limits multi-level semantic expressiveness; and disjointed spatio-temporal modeling hampered by restricted temporal receptive fields. To address these gaps, we propose the Multi-scale Dynamic Topology Spatio-Temporal Graph Convolutional Network (MD-STGCN). MD-STGCN integrates two complementary modules: a Dynamic Topology Graph Convolution (DT-GC) module that employs a learnable channel-wise topology gate to produce channel-adaptive sub-topologies and adaptive edge weights for finer spatial reasoning, and a Multi-Scale Temporal Convolution (MS-TC) module that uses a lightweight multi-branch design with depthwise-separable temporal convolutions to efficiently capture both short- and long-range temporal dependencies. Extensive experiments on NTU RGB+D and NTU RGB+D 120 demonstrate that MD-STGCN achieves state-of-the-art performance: the joint stream achieves a Top-1 accuracy of 99.0% on NTU cross-view, yielding a 1.6% absolute gain over STGCN++. Moreover, MD-STGCN requires 0.33M parameters and 0.45 GFLOPs, representing a substantial reduction in model complexity compared with CTR-GCN and MS-G3D, thereby offering a superior accuracy–efficiency trade-off. MD-STGCN is a promising candidate for resource-constrained scenarios, particularly in real-time surveillance and human–computer interaction systems.
| Original language | English |
|---|---|
| Article number | 123572 |
| Journal | Information Sciences |
| Volume | 752 |
| DOIs | |
| Publication status | Published - 5 Oct 2026 |
| Externally published | Yes |
Keywords
- Dynamic topology
- Graph convolutional network
- Multi-scale temporal convolution
- Skeleton-based action recognition
- Spatio-temporal feature learning
Fingerprint
Dive into the research topics of 'MD-STGCN: Dynamic topology and multi-scale temporal modeling for skeleton-based action recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver