TY - GEN
T1 - Exploiting human pose for weakly-supervised temporal action localization
AU - Zhu, Bing
AU - Li, Tianyu
AU - Wu, Xinxiao
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.
AB - Weakly-supervised temporal action localization aims to predict when and what actions occur in untrimmed videos with only videolevel class labels. Most current methods make prediction based on global features, while ignoring the classification performance of local descriptions of human body. Additionally, these methods generate incomplete proposals via thresholding, which is too single and crude. To acquire high-quality proposals, we focus on incorporating local information, i.e. human body poses in videos, and propose a noval method called Class Activation and Pose Pattern (CAPP) for weakly-supervised temporal action localization. In our method, action proposals are generated by two modules: A Class Activation Sequence (CAS) module and a Pose Pattern Sequence (PPS) module. The CAS module fuses global features and local features to improve clip-level classification performance and the PPS module adds complementary proposals with high recall via pose pattern clustering. CAPP outperforms the state-of-the-art methods on both the THUMOS-14 and ActivityNet v1.2 datasets, which demonstrates the effectiveness of our method.
KW - Pose estimation
KW - Temporal action localization
KW - Weakly supervised
UR - http://www.scopus.com/inward/record.url?scp=85084411854&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31726-3_40
DO - 10.1007/978-3-030-31726-3_40
M3 - Conference contribution
AN - SCOPUS:85084411854
SN - 9783030317256
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 466
EP - 478
BT - Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part III
A2 - Lin, Zhouchen
A2 - Wang, Liang
A2 - Tan, Tieniu
A2 - Yang, Jian
A2 - Shi, Guangming
A2 - Zheng, Nanning
A2 - Chen, Xilin
A2 - Zhang, Yanning
PB - Springer
T2 - 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Y2 - 8 November 2019 through 11 November 2019
ER -