Weakly-supervised action recognition and localization via knowledge transfer

Haichao Shi; Xiaoyu Zhang; Changsheng Li

doi:10.1007/978-3-030-31654-9_18

Weakly-supervised action recognition and localization via knowledge transfer

Haichao Shi, Xiaoyu Zhang^*, Changsheng Li

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Citation (Scopus)

Abstract

Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios, which is not practical in many real-world applications. To alleviate the problem, we propose a novel weakly-supervised action recognition framework for untrimmed videos to use only video-level annotations to transfer information from publicly available trimmed videos to assist in model learning, namely KTUntrimmedNet. A two-stage method is designed to guarantee an effective transfer strategy: Firstly, the trimmed and untrimmed videos are clustered to find similar classes between them, so as to avoid negative information transfer from trimmed data. Secondly, we design an invariant module to find common features between trimmed videos and untrimmed videos for improving the performance. Extensive experiments on the standard benchmark datasets, THUMOS14 and ActivityNet1.3, clearly demonstrate the efficacy of our proposed method when compared with the existing state-of-the-arts.

Original language	English
Title of host publication	Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I
Editors	Zhouchen Lin, Liang Wang, Tieniu Tan, Jian Yang, Guangming Shi, Nanning Zheng, Xilin Chen, Yanning Zhang
Publisher	Springer
Pages	205-216
Number of pages	12
ISBN (Print)	9783030316532
DOIs	https://doi.org/10.1007/978-3-030-31654-9_18
Publication status	Published - 2019
Externally published	Yes
Event	2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 - Xi'an, China Duration: 8 Nov 2019 → 11 Nov 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11857 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019
Country/Territory	China
City	Xi'an
Period	8/11/19 → 11/11/19

Keywords

Action localization
Action recognition
Knowledge transfer

Access to Document

10.1007/978-3-030-31654-9_18

Cite this

Shi, H., Zhang, X., & Li, C. (2019). Weakly-supervised action recognition and localization via knowledge transfer. In Z. Lin, L. Wang, T. Tan, J. Yang, G. Shi, N. Zheng, X. Chen, & Y. Zhang (Eds.), Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I (pp. 205-216). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11857 LNCS). Springer. https://doi.org/10.1007/978-3-030-31654-9_18

Shi, Haichao ; Zhang, Xiaoyu ; Li, Changsheng. / Weakly-supervised action recognition and localization via knowledge transfer. Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I. editor / Zhouchen Lin ; Liang Wang ; Tieniu Tan ; Jian Yang ; Guangming Shi ; Nanning Zheng ; Xilin Chen ; Yanning Zhang. Springer, 2019. pp. 205-216 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{b2252244d6d64841a19070a84a9bb9d0,

title = "Weakly-supervised action recognition and localization via knowledge transfer",

abstract = "Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios, which is not practical in many real-world applications. To alleviate the problem, we propose a novel weakly-supervised action recognition framework for untrimmed videos to use only video-level annotations to transfer information from publicly available trimmed videos to assist in model learning, namely KTUntrimmedNet. A two-stage method is designed to guarantee an effective transfer strategy: Firstly, the trimmed and untrimmed videos are clustered to find similar classes between them, so as to avoid negative information transfer from trimmed data. Secondly, we design an invariant module to find common features between trimmed videos and untrimmed videos for improving the performance. Extensive experiments on the standard benchmark datasets, THUMOS14 and ActivityNet1.3, clearly demonstrate the efficacy of our proposed method when compared with the existing state-of-the-arts.",

keywords = "Action localization, Action recognition, Knowledge transfer",

author = "Haichao Shi and Xiaoyu Zhang and Changsheng Li",

note = "Publisher Copyright: {\textcopyright} Springer Nature Switzerland AG 2019.; 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019 ; Conference date: 08-11-2019 Through 11-11-2019",

year = "2019",

doi = "10.1007/978-3-030-31654-9_18",

language = "English",

isbn = "9783030316532",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "205--216",

editor = "Zhouchen Lin and Liang Wang and Tieniu Tan and Jian Yang and Guangming Shi and Nanning Zheng and Xilin Chen and Yanning Zhang",

booktitle = "Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I",

address = "Germany",

}

Shi, H, Zhang, X & Li, C 2019, Weakly-supervised action recognition and localization via knowledge transfer. in Z Lin, L Wang, T Tan, J Yang, G Shi, N Zheng, X Chen & Y Zhang (eds), Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11857 LNCS, Springer, pp. 205-216, 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019, Xi'an, China, 8/11/19. https://doi.org/10.1007/978-3-030-31654-9_18

Weakly-supervised action recognition and localization via knowledge transfer. / Shi, Haichao; Zhang, Xiaoyu; Li, Changsheng.
Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I. ed. / Zhouchen Lin; Liang Wang; Tieniu Tan; Jian Yang; Guangming Shi; Nanning Zheng; Xilin Chen; Yanning Zhang. Springer, 2019. p. 205-216 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11857 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Weakly-supervised action recognition and localization via knowledge transfer

AU - Shi, Haichao

AU - Zhang, Xiaoyu

AU - Li, Changsheng

N1 - Publisher Copyright: © Springer Nature Switzerland AG 2019.

PY - 2019

Y1 - 2019

N2 - Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios, which is not practical in many real-world applications. To alleviate the problem, we propose a novel weakly-supervised action recognition framework for untrimmed videos to use only video-level annotations to transfer information from publicly available trimmed videos to assist in model learning, namely KTUntrimmedNet. A two-stage method is designed to guarantee an effective transfer strategy: Firstly, the trimmed and untrimmed videos are clustered to find similar classes between them, so as to avoid negative information transfer from trimmed data. Secondly, we design an invariant module to find common features between trimmed videos and untrimmed videos for improving the performance. Extensive experiments on the standard benchmark datasets, THUMOS14 and ActivityNet1.3, clearly demonstrate the efficacy of our proposed method when compared with the existing state-of-the-arts.

AB - Action recognition and localization has attracted much attention in the past decade. However, a challenging problem is that it typically requires large-scale temporal annotations of action instances for training models in untrimmed video scenarios, which is not practical in many real-world applications. To alleviate the problem, we propose a novel weakly-supervised action recognition framework for untrimmed videos to use only video-level annotations to transfer information from publicly available trimmed videos to assist in model learning, namely KTUntrimmedNet. A two-stage method is designed to guarantee an effective transfer strategy: Firstly, the trimmed and untrimmed videos are clustered to find similar classes between them, so as to avoid negative information transfer from trimmed data. Secondly, we design an invariant module to find common features between trimmed videos and untrimmed videos for improving the performance. Extensive experiments on the standard benchmark datasets, THUMOS14 and ActivityNet1.3, clearly demonstrate the efficacy of our proposed method when compared with the existing state-of-the-arts.

KW - Action localization

KW - Action recognition

KW - Knowledge transfer

UR - http://www.scopus.com/inward/record.url?scp=85086141617&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-31654-9_18

DO - 10.1007/978-3-030-31654-9_18

M3 - Conference contribution

AN - SCOPUS:85086141617

SN - 9783030316532

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 205

EP - 216

BT - Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I

A2 - Lin, Zhouchen

A2 - Wang, Liang

A2 - Tan, Tieniu

A2 - Yang, Jian

A2 - Shi, Guangming

A2 - Zheng, Nanning

A2 - Chen, Xilin

A2 - Zhang, Yanning

PB - Springer

T2 - 2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019

Y2 - 8 November 2019 through 11 November 2019

ER -

Shi H, Zhang X, Li C. Weakly-supervised action recognition and localization via knowledge transfer. In Lin Z, Wang L, Tan T, Yang J, Shi G, Zheng N, Chen X, Zhang Y, editors, Pattern Recognition and Computer Vision- 2nd Chinese Conference, PRCV 2019, Proceedings, Part I. Springer. 2019. p. 205-216. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-31654-9_18

Weakly-supervised action recognition and localization via knowledge transfer

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this