TY - GEN
T1 - Weakly-supervised action recognition and localization via knowledge transfer
AU - Shi, Haichao
AU - Zhang, Xiaoyu
AU - Li, Changsheng
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios, which is not practical in many real-world applications. To alleviate the problem, we propose a novel weakly-supervised action recognition framework for untrimmed videos to use only video-level annotations to transfer information from publicly available trimmed videos to assist in model learning, namely KTUntrimmedNet. A two-stage method is designed to guarantee an effective transfer strategy: Firstly, the trimmed and untrimmed videos are clustered to find similar classes between them, so as to avoid negative information transfer from trimmed data. Secondly, we design an invariant module to find common features between trimmed videos and untrimmed videos for improving the performance. Extensive experiments on the standard benchmark datasets, THUMOS14 and ActivityNet1.3, clearly demonstrate the efficacy of our proposed method when compared with the existing state-of-the-arts.
AB - Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios, which is not practical in many real-world applications. To alleviate the problem, we propose a novel weakly-supervised action recognition framework for untrimmed videos to use only video-level annotations to transfer information from publicly available trimmed videos to assist in model learning, namely KTUntrimmedNet. A two-stage method is designed to guarantee an effective transfer strategy: Firstly, the trimmed and untrimmed videos are clustered to find similar classes between them, so as to avoid negative information transfer from trimmed data. Secondly, we design an invariant module to find common features between trimmed videos and untrimmed videos for improving the performance. Extensive experiments on the standard benchmark datasets, THUMOS14 and ActivityNet1.3, clearly demonstrate the efficacy of our proposed method when compared with the existing state-of-the-arts.
KW - Action localization
KW - Action recognition
KW - Knowledge transfer
UR - http://www.scopus.com/inward/record.url?scp=85086141617&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31654-9_18
DO - 10.1007/978-3-030-31654-9_18
M3 - Conference contribution
AN - SCOPUS:85086141617
SN - 9783030316532
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 205
EP - 216
BT - Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I
A2 - Lin, Zhouchen
A2 - Wang, Liang
A2 - Tan, Tieniu
A2 - Yang, Jian
A2 - Shi, Guangming
A2 - Zheng, Nanning
A2 - Chen, Xilin
A2 - Zhang, Yanning
PB - Springer
T2 - 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Y2 - 8 November 2019 through 11 November 2019
ER -