Exploiting human pose for weakly-supervised temporal action localization

Bing Zhu, Tianyu Li, Xinxiao Wu*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III
EditorsZhouchen Lin, Liang Wang, Tieniu Tan, Jian Yang, Guangming Shi, Nanning Zheng, Xilin Chen, Yanning Zhang
PublisherSpringer
Pages466-478
Number of pages13
ISBN (Print)9783030317256
DOIs
Publication statusPublished - 2019
Event2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 - Xi’an, China
Duration: 8 Nov 201911 Nov 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11859 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Country/TerritoryChina
CityXi’an
Period8/11/1911/11/19

Keywords

  • Pose estimation
  • Temporal action localization
  • Weakly supervised

Fingerprint

Dive into the research topics of 'Exploiting human pose for weakly-supervised temporal action localization'. Together they form a unique fingerprint.

Cite this