Online spatio-temporal action detection with adaptive sampling and hierarchical modulation

Shaowen Su*, Minggang Gan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Online spatio-temporal action detection (OSTAD) is a crucial task in video understanding, responsible for identifying and categorizing action instances in video streams in an online manner. This paper presents a novel approach that employs adaptive sampling and hierarchical modulation to enhance OSTAD capabilities. Traditional methods, often constrained by fixed sampling rates, may lead to redundancy in scenarios with slower action speeds and overlook essential details in faster-moving sequences. Our innovative dynamic sampling strategy, informed by speed estimation, adaptively adjusts sampling intervals based on speed attention and visual differential features, thereby optimizing the informational content of each sampled video clip. Additionally, our method incorporates a hierarchical modulation mechanism that synergizes high-level semantic and low-level spatial information, significantly enhancing action localization and classification accuracy. The adaptive sampling network with hierarchical modulation, underpinned by these advancements, demonstrates substantial improvements on benchmark datasets such as JHMDB21 and UCF24, proving our methods’ efficacy in handling diverse and dynamic action sequences in an online setting.

Original languageEnglish
Article number349
JournalMultimedia Systems
Volume30
Issue number6
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Adaptive sampling
  • Hierarchical modulation
  • Online
  • Spatio-temporal action detection
  • Speed estimation

Fingerprint

Dive into the research topics of 'Online spatio-temporal action detection with adaptive sampling and hierarchical modulation'. Together they form a unique fingerprint.

Cite this