Learning weighted video segments for temporal action localization

Che Sun, Hao Song, Xinxiao Wu*, Yunde Jia

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

This paper proposes a novel approach of learning weighted video segments via supervised temporal attention for action localization in untrimmed videos. The learned segment weights represent informativeness of video segments to recognize actions and benefit inferring the boundaries to temporally localize actions. We build a Supervised Temporal Attention Network (STAN) to dynamically learn the weights of video segments, and generate descriptive and discriminative video representations. We use a proposal generator and a classifier to estimate the boundaries of actions and classify the classes of actions, respectively. Extensive experiments are conducted on two public benchmarks THUMOS2014 and ActivityNet1.3. The results demonstrate that our approach achieves substantially better performance than the state-of-the-art methods, verifying the effectiveness of learning weighted video segments.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I
EditorsZhouchen Lin, Liang Wang, Tieniu Tan, Jian Yang, Guangming Shi, Nanning Zheng, Xilin Chen, Yanning Zhang
PublisherSpringer
Pages181-192
Number of pages12
ISBN (Print)9783030316532
DOIs
Publication statusPublished - 2019
Event2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 - Xi'an, China
Duration: 8 Nov 201911 Nov 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11857 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Country/TerritoryChina
CityXi'an
Period8/11/1911/11/19

Keywords

  • Attention mechanism
  • Temporal action localization
  • Weighted video segments

Fingerprint

Dive into the research topics of 'Learning weighted video segments for temporal action localization'. Together they form a unique fingerprint.

Cite this