Recognizing key segments of videos for video annotation by learning from web image sets

Hao Song; Xinxiao Wu; Wei Liang; Yunde Jia

doi:10.1007/s11042-016-3253-1

Recognizing key segments of videos for video annotation by learning from web image sets

Hao Song, Xinxiao Wu^*, Wei Liang, Yunde Jia

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

源语言	英语
页（从-至）	6111-6126
页数	16
期刊	Multimedia Tools and Applications
卷	76
期	5
DOI	https://doi.org/10.1007/s11042-016-3253-1
出版状态	已出版 - 1 3月 2017

访问文件

10.1007/s11042-016-3253-1

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{9080889d2da64a01a9de9c8da3d962e9,

title = "Recognizing key segments of videos for video annotation by learning from web image sets",

abstract = "In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia{\textquoteright}s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.",

keywords = "Image set, Key segment, Transfer learning, Video annotation",

author = "Hao Song and Xinxiao Wu and Wei Liang and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} 2016, Springer Science+Business Media New York.",

year = "2017",

month = mar,

day = "1",

doi = "10.1007/s11042-016-3253-1",

language = "English",

volume = "76",

pages = "6111--6126",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer Netherlands",

number = "5",

}

TY - JOUR

T1 - Recognizing key segments of videos for video annotation by learning from web image sets

AU - Song, Hao

AU - Wu, Xinxiao

AU - Liang, Wei

AU - Jia, Yunde

PY - 2017/3/1

Y1 - 2017/3/1

N2 - In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

AB - In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

KW - Image set

KW - Key segment

KW - Transfer learning

KW - Video annotation

UR - http://www.scopus.com/inward/record.url?scp=84956966269&partnerID=8YFLogxK

U2 - 10.1007/s11042-016-3253-1

DO - 10.1007/s11042-016-3253-1

M3 - Article

AN - SCOPUS:84956966269

SN - 1380-7501

VL - 76

SP - 6111

EP - 6126

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 5

ER -

Recognizing key segments of videos for video annotation by learning from web image sets

摘要

访问文件

其它文件与链接

指纹

引用此