Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Lei Wang*, Dawei Song, Eyad Elyan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Citations (Scopus)

Abstract

Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

Original languageEnglish
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages1303-1312
Number of pages10
DOIs
Publication statusPublished - 2012
Externally publishedYes
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: 29 Oct 20122 Nov 2012

Publication series

NameACM International Conference Proceeding Series

Conference

Conference21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Country/TerritoryUnited States
CityMaui, HI
Period29/10/122/11/12

Keywords

  • bag-of-visual-word
  • content based video retrieval
  • discriminative visual word
  • query-by-example
  • spatial-temporal correlation

Fingerprint

Dive into the research topics of 'Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval'. Together they form a unique fingerprint.

Cite this