Weakly-supervised action localization via embedding-modeling iterative optimization

Xiao Yu Zhang*, Haichao Shi, Changsheng Li, Peng Li, Zekun Li, Peng Ren

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)

Abstract

Action recognition and localization in untrimmed videos in weakly supervised scenario is a challenging problem of great application prospects. Limited by the information available in video-level labels, it is a promising attempt to fully leverage the instructive knowledge learned on trimmed videos to facilitate analysis of untrimmed videos, considering that there are abundant trimmed videos which are publicly available and well segmented with semantic descriptions. In order to enforce effective trimmed-untrimmed augmentation, this paper presents a novel framework of embedding-modeling iterative optimization network, referred to as IONet. In the proposed method, action classification modeling and shared subspace embedding are learned jointly in an iterative way, so that robust cross-domain knowledge transfer is achieved. With a carefully designed two-stage self-attentive representation learning workflow for untrimmed videos, irrelevant backgrounds are eliminated and fine-grained temporal relevance can be robustly explored. Extensive experiments are conducted on two benchmark datasets, i.e., THUMOS14 and ActivityNet1.3, and experimental results clearly corroborate the efficacy of our method. Source code is available on GitHub.

Original languageEnglish
Article number107831
JournalPattern Recognition
Volume113
DOIs
Publication statusPublished - May 2021

Keywords

  • Action recognition
  • Attention mechanism
  • Generative adversarial networks
  • Subspace embedding
  • Temporal action localization

Fingerprint

Dive into the research topics of 'Weakly-supervised action localization via embedding-modeling iterative optimization'. Together they form a unique fingerprint.

Cite this