Weakly Supervised Action Recognition and Localization Using Web Images

Cuiwei Liu; Xinxiao Wu; Yunde Jia

doi:10.1007/978-3-319-16814-2_42

Weakly Supervised Action Recognition and Localization Using Web Images

Cuiwei Liu^*, Xinxiao Wu, Yunde Jia

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

This paper addresses the problem of joint recognition and localization of actions in videos. We develop a novel Transfer Latent Support Vector Machine (TLSVM) by using Web images and weakly annotated training videos. In order to alleviate the laborious and timeconsuming manual annotations of action locations, the model takes training videos which are only annotated with action labels as input. Due to the non-available ground-truth of action locations in videos, the locations are treated as latent variables in our method and are inferred during both training and testing phrases. For the purpose of improving the localization accuracy with some prior information of action locations, we collect a number ofWeb images which are annotated with both action labels and action locations to learn a discriminative model by enforcing the local similarities between videos and Web images. A structural transformation based on randomized clustering forest is used to map Web images to videos for handling the heterogeneous features of Web images and videos. Experiments on two publicly available action datasets demonstrate that the proposed model is effective for both action localization and action recognition.

Original language	English
Title of host publication	Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers
Editors	Daniel Cremers, Hideo Saito, Ian Reid, Ming-Hsuan Yang
Publisher	Springer Verlag
Pages	642-657
Number of pages	16
ISBN (Electronic)	9783319168135
DOIs	https://doi.org/10.1007/978-3-319-16814-2_42
Publication status	Published - 2015
Event	12th Asian Conference on Computer Vision, ACCV 2014 - Singapore, Singapore Duration: 1 Nov 2014 → 5 Nov 2014

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	9007
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	12th Asian Conference on Computer Vision, ACCV 2014
Country/Territory	Singapore
City	Singapore
Period	1/11/14 → 5/11/14

Access to Document

10.1007/978-3-319-16814-2_42

Cite this

Liu, C., Wu, X., & Jia, Y. (2015). Weakly Supervised Action Recognition and Localization Using Web Images. In D. Cremers, H. Saito, I. Reid, & M.-H. Yang (Eds.), Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers (pp. 642-657). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9007). Springer Verlag. https://doi.org/10.1007/978-3-319-16814-2_42

Liu, Cuiwei ; Wu, Xinxiao ; Jia, Yunde. / Weakly Supervised Action Recognition and Localization Using Web Images. Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. editor / Daniel Cremers ; Hideo Saito ; Ian Reid ; Ming-Hsuan Yang. Springer Verlag, 2015. pp. 642-657 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{04dda9b9da75474691708b2c5937029c,

title = "Weakly Supervised Action Recognition and Localization Using Web Images",

abstract = "This paper addresses the problem of joint recognition and localization of actions in videos. We develop a novel Transfer Latent Support Vector Machine (TLSVM) by using Web images and weakly annotated training videos. In order to alleviate the laborious and timeconsuming manual annotations of action locations, the model takes training videos which are only annotated with action labels as input. Due to the non-available ground-truth of action locations in videos, the locations are treated as latent variables in our method and are inferred during both training and testing phrases. For the purpose of improving the localization accuracy with some prior information of action locations, we collect a number ofWeb images which are annotated with both action labels and action locations to learn a discriminative model by enforcing the local similarities between videos and Web images. A structural transformation based on randomized clustering forest is used to map Web images to videos for handling the heterogeneous features of Web images and videos. Experiments on two publicly available action datasets demonstrate that the proposed model is effective for both action localization and action recognition.",

author = "Cuiwei Liu and Xinxiao Wu and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2015.; 12th Asian Conference on Computer Vision, ACCV 2014 ; Conference date: 01-11-2014 Through 05-11-2014",

year = "2015",

doi = "10.1007/978-3-319-16814-2_42",

language = "English",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "642--657",

editor = "Daniel Cremers and Hideo Saito and Ian Reid and Ming-Hsuan Yang",

booktitle = "Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers",

address = "Germany",

}

Liu, C, Wu, X & Jia, Y 2015, Weakly Supervised Action Recognition and Localization Using Web Images. in D Cremers, H Saito, I Reid & M-H Yang (eds), Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9007, Springer Verlag, pp. 642-657, 12th Asian Conference on Computer Vision, ACCV 2014, Singapore, Singapore, 1/11/14. https://doi.org/10.1007/978-3-319-16814-2_42

Weakly Supervised Action Recognition and Localization Using Web Images. / Liu, Cuiwei; Wu, Xinxiao; Jia, Yunde.
Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. ed. / Daniel Cremers; Hideo Saito; Ian Reid; Ming-Hsuan Yang. Springer Verlag, 2015. p. 642-657 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9007).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Weakly Supervised Action Recognition and Localization Using Web Images

AU - Liu, Cuiwei

AU - Wu, Xinxiao

AU - Jia, Yunde

N1 - Publisher Copyright: © Springer International Publishing Switzerland 2015.

PY - 2015

Y1 - 2015

N2 - This paper addresses the problem of joint recognition and localization of actions in videos. We develop a novel Transfer Latent Support Vector Machine (TLSVM) by using Web images and weakly annotated training videos. In order to alleviate the laborious and timeconsuming manual annotations of action locations, the model takes training videos which are only annotated with action labels as input. Due to the non-available ground-truth of action locations in videos, the locations are treated as latent variables in our method and are inferred during both training and testing phrases. For the purpose of improving the localization accuracy with some prior information of action locations, we collect a number ofWeb images which are annotated with both action labels and action locations to learn a discriminative model by enforcing the local similarities between videos and Web images. A structural transformation based on randomized clustering forest is used to map Web images to videos for handling the heterogeneous features of Web images and videos. Experiments on two publicly available action datasets demonstrate that the proposed model is effective for both action localization and action recognition.

AB - This paper addresses the problem of joint recognition and localization of actions in videos. We develop a novel Transfer Latent Support Vector Machine (TLSVM) by using Web images and weakly annotated training videos. In order to alleviate the laborious and timeconsuming manual annotations of action locations, the model takes training videos which are only annotated with action labels as input. Due to the non-available ground-truth of action locations in videos, the locations are treated as latent variables in our method and are inferred during both training and testing phrases. For the purpose of improving the localization accuracy with some prior information of action locations, we collect a number ofWeb images which are annotated with both action labels and action locations to learn a discriminative model by enforcing the local similarities between videos and Web images. A structural transformation based on randomized clustering forest is used to map Web images to videos for handling the heterogeneous features of Web images and videos. Experiments on two publicly available action datasets demonstrate that the proposed model is effective for both action localization and action recognition.

UR - http://www.scopus.com/inward/record.url?scp=84929625992&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-16814-2_42

DO - 10.1007/978-3-319-16814-2_42

M3 - Conference contribution

AN - SCOPUS:84929625992

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 642

EP - 657

BT - Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers

A2 - Cremers, Daniel

A2 - Saito, Hideo

A2 - Reid, Ian

A2 - Yang, Ming-Hsuan

PB - Springer Verlag

T2 - 12th Asian Conference on Computer Vision, ACCV 2014

Y2 - 1 November 2014 through 5 November 2014

ER -

Liu C, Wu X, Jia Y. Weakly Supervised Action Recognition and Localization Using Web Images. In Cremers D, Saito H, Reid I, Yang MH, editors, Computer Vision - ACCV 2014 - 12th Asian Conference on Computer Vision, Revised Selected Papers. Springer Verlag. 2015. p. 642-657. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-16814-2_42

Weakly Supervised Action Recognition and Localization Using Web Images

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this