You can only watch the past: track attention network for online spatio-temporal action detection

  • Shaowen Su*
  • , Minggang Gan
  • , Yan Zhang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Online spatio-temporal action detection (OSTAD) aims to identify and localize action instances in real-time video streams without accessing future frames. However, the online setting imposes strict constraints of incremental inference, limited memory, and causal processing, which severely restrict the availability of effective information. To address this, we propose the track attention network (TAN), introducing a history-aware track-and-detect paradigm. Instead of detecting actions independently at each frame, TAN leverages historical detection results and spatio-temporal continuity to enhance current-frame features. Specifically, we propose three strategies. First, a history-aware actor distribution prediction strategy estimates current actor distributions based on spatial continuity and appearance similarity. Second, an actor distribution inference strategy via track attention introduces two attention modules—track channel attention and track efficient attention—to model semantic relations among actor distributions for robust fusion. Third, a history-aware feature modulation strategy injects localization priors from actor distributions into action features, improving representation quality and detection accuracy. Extensive experiments on the JHMDB21 and UCF24 benchmarks demonstrate the effectiveness of our method. TAN achieves 80.3% frame-level mAP (f-mAP) and 88.3% video-level mAP (v-mAP) on JHMDB21, and 88.1% f-mAP and 54.8% v-mAP on UCF24, outperforming existing online methods and even several offline approaches.

Original languageEnglish
Article number122107
JournalScience China Information Sciences
Volume69
Issue number2
DOIs
Publication statusPublished - Feb 2026
Externally publishedYes

Keywords

  • actor distribution
  • historical detection
  • online spatio-temporal action detection
  • track attention
  • track-and-detect

Fingerprint

Dive into the research topics of 'You can only watch the past: track attention network for online spatio-temporal action detection'. Together they form a unique fingerprint.

Cite this