Recognizing key segments of videos for video annotation by learning from web image sets

Hao Song, Xinxiao Wu*, Wei Liang, Yunde Jia

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

In this paper, we propose an approach of inferring the labels of unlabeled consumer videos and at the same time recognizing the key segments of the videos by learning from Web image sets for video annotation. The key segments of the videos are automatically recognized by transferring the knowledge learned from related Web image sets to the videos. We introduce an adaptive latent structural SVM method to adapt the pre-learned classifiers using Web image sets to an optimal target classifier, where the locations of the key segments are modeled as latent variables because the ground-truth of key segments are not available. We utilize a limited number of labeled videos and abundant labeled Web images for training annotation models, which significantly alleviates the time-consuming and labor-expensive collection of a large number of labeled training videos. Experiment on the two challenge datasets Columbia’s Consumer Video (CCV) and TRECVID 2014 Multimedia Event Detection (MED2014) shows our method performs better than state-of-art methods.

Original languageEnglish
Pages (from-to)6111-6126
Number of pages16
JournalMultimedia Tools and Applications
Volume76
Issue number5
DOIs
Publication statusPublished - 1 Mar 2017

Keywords

  • Image set
  • Key segment
  • Transfer learning
  • Video annotation

Fingerprint

Dive into the research topics of 'Recognizing key segments of videos for video annotation by learning from web image sets'. Together they form a unique fingerprint.

Cite this