Extracting Key Segments of Videos for Event Detection by Learning from Web Sources

Hao Song, Xinxiao Wu*, Wennan Yu, Yunde Jia

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

16 引用 (Scopus)

摘要

In this paper, we present a novel approach of extracting the key segments for event detection in unconstrained videos. The key segments are automatically extracted by transferring the knowledge learned from Web images and Web videos to consumer videos. We propose an adaptive latent structural support vector machine model, where the locations of key segments in videos are regarded as latent variables due to the unavailability of the ground truth of key-segment locations in training data. In order to alleviate the time-consuming and labor-expensive manual annotation of huge amounts of training videos, a large number of loosely labeled Web images as well as videos are collected from the Web sources. Additionally, a limited number of labeled consumer videos are utilized to guarantee the precision of the model. Considering the semantic diversity of key segments, we learn a set of concepts as the semantic description of key segments and explore the temporal information of concepts to capture the sequential relations between the segments. The concepts are automatically discovered by using Web images and videos with their associated tags and description sentences. Comprehensive experiments on the Columbia's consumer video and the TRECVID 2014 Multimedia Event Detection datasets demonstrate that our method outperforms the state-of-the-art methods.

源语言英语
页(从-至)1088-1100
页数13
期刊IEEE Transactions on Multimedia
20
5
DOI
出版状态已出版 - 5月 2018

指纹

探究 'Extracting Key Segments of Videos for Event Detection by Learning from Web Sources' 的科研主题。它们共同构成独一无二的指纹。

引用此