Recognizing key segments of videos for video annotation by learning from web image sets

Hao Song; Xinxiao Wu; Wei Liang; Yunde Jia

doi:10.1007/s11042-016-3253-1

Recognizing key segments of videos for video annotation by learning from web image sets

Hao Song, Xinxiao Wu^*, Wei Liang, Yunde Jia

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

Original language	English
Pages (from-to)	6111-6126
Number of pages	16
Journal	Multimedia Tools and Applications
Volume	76
Issue number	5
DOIs	https://doi.org/10.1007/s11042-016-3253-1
Publication status	Published - 1 Mar 2017

Keywords

Image set
Key segment
Transfer learning
Video annotation

Access to Document

10.1007/s11042-016-3253-1

Cite this

@article{9080889d2da64a01a9de9c8da3d962e9,

title = "Recognizing key segments of videos for video annotation by learning from web image sets",

abstract = "In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia{\textquoteright}s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.",

keywords = "Image set, Key segment, Transfer learning, Video annotation",

author = "Hao Song and Xinxiao Wu and Wei Liang and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} 2016, Springer Science+Business Media New York.",

year = "2017",

month = mar,

day = "1",

doi = "10.1007/s11042-016-3253-1",

language = "English",

volume = "76",

pages = "6111--6126",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer Netherlands",

number = "5",

}

TY - JOUR

T1 - Recognizing key segments of videos for video annotation by learning from web image sets

AU - Song, Hao

AU - Wu, Xinxiao

AU - Liang, Wei

AU - Jia, Yunde

PY - 2017/3/1

Y1 - 2017/3/1

N2 - In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

AB - In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

KW - Image set

KW - Key segment

KW - Transfer learning

KW - Video annotation

UR - http://www.scopus.com/inward/record.url?scp=84956966269&partnerID=8YFLogxK

U2 - 10.1007/s11042-016-3253-1

DO - 10.1007/s11042-016-3253-1

M3 - Article

AN - SCOPUS:84956966269

SN - 1380-7501

VL - 76

SP - 6111

EP - 6126

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 5

ER -

Recognizing key segments of videos for video annotation by learning from web image sets

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this