Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Lei Wang; Dawei Song; Eyad Elyan

doi:10.1145/2396761.2398433

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Lei Wang^*, Dawei Song, Eyad Elyan

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

13 Citations (Scopus)

Abstract

Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

Original language	English
Title of host publication	CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages	1303-1312
Number of pages	10
DOIs	https://doi.org/10.1145/2396761.2398433
Publication status	Published - 2012
Externally published	Yes
Event	21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States Duration: 29 Oct 2012 → 2 Nov 2012

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Country/Territory	United States
City	Maui, HI
Period	29/10/12 → 2/11/12

Keywords

bag-of-visual-word
content based video retrieval
discriminative visual word
query-by-example
spatial-temporal correlation

Access to Document

10.1145/2396761.2398433

Cite this

Wang, L., Song, D., & Elyan, E. (2012). Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval. In CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management (pp. 1303-1312). (ACM International Conference Proceeding Series). https://doi.org/10.1145/2396761.2398433

@inproceedings{b279615452814e4fb3316b303d82618e,

title = "Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval",

abstract = "Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.",

keywords = "bag-of-visual-word, content based video retrieval, discriminative visual word, query-by-example, spatial-temporal correlation",

author = "Lei Wang and Dawei Song and Eyad Elyan",

year = "2012",

doi = "10.1145/2396761.2398433",

language = "English",

isbn = "9781450311564",

series = "ACM International Conference Proceeding Series",

pages = "1303--1312",

booktitle = "CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management",

note = "21st ACM International Conference on Information and Knowledge Management, CIKM 2012 ; Conference date: 29-10-2012 Through 02-11-2012",

}

Wang, L, Song, D & Elyan, E 2012, Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval. in CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM International Conference Proceeding Series, pp. 1303-1312, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, United States, 29/10/12. https://doi.org/10.1145/2396761.2398433

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval. / Wang, Lei; Song, Dawei; Elyan, Eyad.
CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. p. 1303-1312 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

AU - Wang, Lei

AU - Song, Dawei

AU - Elyan, Eyad

PY - 2012

Y1 - 2012

N2 - Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

AB - Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

KW - bag-of-visual-word

KW - content based video retrieval

KW - discriminative visual word

KW - query-by-example

KW - spatial-temporal correlation

UR - http://www.scopus.com/inward/record.url?scp=84871048635&partnerID=8YFLogxK

U2 - 10.1145/2396761.2398433

DO - 10.1145/2396761.2398433

M3 - Conference contribution

AN - SCOPUS:84871048635

SN - 9781450311564

T3 - ACM International Conference Proceeding Series

SP - 1303

EP - 1312

BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management

T2 - 21st ACM International Conference on Information and Knowledge Management, CIKM 2012

Y2 - 29 October 2012 through 2 November 2012

ER -

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this