Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Lei Wang; Dawei Song; Eyad Elyan

doi:10.1145/2396761.2398433

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

Lei Wang^*, Dawei Song, Eyad Elyan

^*此作品的通讯作者

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

13 引用（Scopus）

摘要

Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

源语言	英语
主期刊名	CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
页	1303-1312
页数	10
DOI	https://doi.org/10.1145/2396761.2398433
出版状态	已出版 - 2012
已对外发布	是
活动	21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, 美国期限: 29 10月 2012 → 2 11月 2012

出版系列

姓名	ACM International Conference Proceeding Series

会议

会议	21st ACM International Conference on Information and Knowledge Management, CIKM 2012
国家/地区	美国
市	Maui, HI
时期	29/10/12 → 2/11/12

访问文件

10.1145/2396761.2398433

其它文件与链接

链接到 Scopus 的出版物

引用此

@inproceedings{b279615452814e4fb3316b303d82618e,

title = "Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval",

abstract = "Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.",

keywords = "bag-of-visual-word, content based video retrieval, discriminative visual word, query-by-example, spatial-temporal correlation",

author = "Lei Wang and Dawei Song and Eyad Elyan",

year = "2012",

doi = "10.1145/2396761.2398433",

language = "English",

isbn = "9781450311564",

series = "ACM International Conference Proceeding Series",

pages = "1303--1312",

booktitle = "CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management",

note = "21st ACM International Conference on Information and Knowledge Management, CIKM 2012 ; Conference date: 29-10-2012 Through 02-11-2012",

}

Wang, L, Song, D & Elyan, E 2012, Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval. 在 CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM International Conference Proceeding Series, 页码 1303-1312, 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, Maui, HI, 美国, 29/10/12. https://doi.org/10.1145/2396761.2398433

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval. / Wang, Lei; Song, Dawei; Elyan, Eyad.
CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. 页码 1303-1312 (ACM International Conference Proceeding Series).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

AU - Wang, Lei

AU - Song, Dawei

AU - Elyan, Eyad

PY - 2012

Y1 - 2012

N2 - Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

AB - Most of the state-of-art approaches to Query-by-Example (QBE) video retrieval are based on the Bag-of-visual-Words (BovW) representation of visual content. It, however, ignores the spatial-temporal information, which is important for similarity measurement between videos. Direct incorporation of such information into the video data representation for a large scale data set is computationally expensive in terms of storage and similarity measurement. It is also static regardless of the change of discriminative power of visual words for different queries. To tackle these limitations, in this paper, we propose to discover Spatial-Temporal Correlations (STC) imposed by the query example to improve the BovW model for video retrieval. The STC, in terms of spatial proximity and relative motion coherence between different visual words, is crucial to identify the discriminative power of the visual words. We develop a novel technique to emphasize the most discriminative visual words for similarity measurement, and incorporate this STC-based approach into the standard inverted index architecture. Our approach is evaluated on the TRECVID2002 and CC-WEB-VIDEO datasets for two typical QBE video retrieval tasks respectively. The experimental results demonstrate that it substantially improves the BovW model as well as a state of the art method that also utilizes spatial-temporal information for QBE video retrieval.

KW - bag-of-visual-word

KW - content based video retrieval

KW - discriminative visual word

KW - query-by-example

KW - spatial-temporal correlation

UR - http://www.scopus.com/inward/record.url?scp=84871048635&partnerID=8YFLogxK

U2 - 10.1145/2396761.2398433

DO - 10.1145/2396761.2398433

M3 - Conference contribution

AN - SCOPUS:84871048635

SN - 9781450311564

T3 - ACM International Conference Proceeding Series

SP - 1303

EP - 1312

BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management

T2 - 21st ACM International Conference on Information and Knowledge Management, CIKM 2012

Y2 - 29 October 2012 through 2 November 2012

ER -

Improving bag-of-visual-words model with spatial-temporal correlation for video retrieval

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此