Exploiting human pose for weakly-supervised temporal action localization

Bing Zhu; Tianyu Li; Xinxiao Wu

doi:10.1007/978-3-030-31726-3_40

Exploiting human pose for weakly-supervised temporal action localization

Bing Zhu, Tianyu Li, Xinxiao Wu^*

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.

Original language	English
Title of host publication	Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III
Editors	Zhouchen Lin, Liang Wang, Tieniu Tan, Jian Yang, Guangming Shi, Nanning Zheng, Xilin Chen, Yanning Zhang
Publisher	Springer
Pages	466-478
Number of pages	13
ISBN (Print)	9783030317256
DOIs	https://doi.org/10.1007/978-3-030-31726-3_40
Publication status	Published - 2019
Event	2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 - Xi’an, China Duration: 8 Nov 2019 → 11 Nov 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11859 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Country/Territory	China
City	Xi’an
Period	8/11/19 → 11/11/19

Keywords

Pose estimation
Temporal action localization
Weakly supervised

Access to Document

10.1007/978-3-030-31726-3_40

Cite this

Zhu, B., Li, T., & Wu, X. (2019). Exploiting human pose for weakly-supervised temporal action localization. In Z. Lin, L. Wang, T. Tan, J. Yang, G. Shi, N. Zheng, X. Chen, & Y. Zhang (Eds.), Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III (pp. 466-478). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11859 LNCS). Springer. https://doi.org/10.1007/978-3-030-31726-3_40

Zhu, Bing ; Li, Tianyu ; Wu, Xinxiao. / Exploiting human pose for weakly-supervised temporal action localization. Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III. editor / Zhouchen Lin ; Liang Wang ; Tieniu Tan ; Jian Yang ; Guangming Shi ; Nanning Zheng ; Xilin Chen ; Yanning Zhang. Springer, 2019. pp. 466-478 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{2e4c94af0826403788aaa1c986823bfb,

title = "Exploiting human pose for weakly-supervised temporal action localization",

abstract = "Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.",

keywords = "Pose estimation, Temporal action localization, Weakly supervised",

author = "Bing Zhu and Tianyu Li and Xinxiao Wu",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2019.; 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 ; Conference date: 08-11-2019 Through 11-11-2019",

year = "2019",

doi = "10.1007/978-3-030-31726-3_40",

language = "English",

isbn = "9783030317256",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "466--478",

editor = "Zhouchen Lin and Liang Wang and Tieniu Tan and Jian Yang and Guangming Shi and Nanning Zheng and Xilin Chen and Yanning Zhang",

booktitle = "Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III",

address = "Germany",

}

Zhu, B, Li, T & Wu, X 2019, Exploiting human pose for weakly-supervised temporal action localization. in Z Lin, L Wang, T Tan, J Yang, G Shi, N Zheng, X Chen & Y Zhang (eds), Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11859 LNCS, Springer, pp. 466-478, 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019, Xi’an, China, 8/11/19. https://doi.org/10.1007/978-3-030-31726-3_40

Exploiting human pose for weakly-supervised temporal action localization. / Zhu, Bing; Li, Tianyu; Wu, Xinxiao.
Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III. ed. / Zhouchen Lin; Liang Wang; Tieniu Tan; Jian Yang; Guangming Shi; Nanning Zheng; Xilin Chen; Yanning Zhang. Springer, 2019. p. 466-478 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11859 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Exploiting human pose for weakly-supervised temporal action localization

AU - Zhu, Bing

AU - Li, Tianyu

AU - Wu, Xinxiao

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2019.

PY - 2019

Y1 - 2019

N2 - Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.

AB - Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.

KW - Pose estimation

KW - Temporal action localization

KW - Weakly supervised

UR - http://www.scopus.com/inward/record.url?scp=85084411854&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-31726-3_40

DO - 10.1007/978-3-030-31726-3_40

M3 - Conference contribution

AN - SCOPUS:85084411854

SN - 9783030317256

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 466

EP - 478

BT - Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III

A2 - Lin, Zhouchen

A2 - Wang, Liang

A2 - Tan, Tieniu

A2 - Yang, Jian

A2 - Shi, Guangming

A2 - Zheng, Nanning

A2 - Chen, Xilin

A2 - Zhang, Yanning

PB - Springer

T2 - 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019

Y2 - 8 November 2019 through 11 November 2019

ER -

Zhu B, Li T, Wu X. Exploiting human pose for weakly-supervised temporal action localization. In Lin Z, Wang L, Tan T, Yang J, Shi G, Zheng N, Chen X, Zhang Y, editors, Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III. Springer. 2019. p. 466-478. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-31726-3_40

Exploiting human pose for weakly-supervised temporal action localization

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this