CNN vs. SIFT for image retrieval: Alternative or complementary?

Ke Yan, Yaowei Wang*, Dawei Liang, Tiejun Huang, Yonghong Tian

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

69 Citations (Scopus)

Abstract

In the past decade, SIFT is widely used in most vision tasks such as image retrieval. While in recent several years, deep convolutional neural networks (CNN) features achieve the state-of-the-art performance in several tasks such as image classification and object detection. Thus a natural question arises: for the image retrieval task, can CNN features substitute for SIFT? In this paper, we experimentally demonstrate that the two kinds of features are highly complementary. Following this fact, we propose an image representation model, complementary CNN and SIFT (CCS), to fuse CNN and SIFT in a multi-level and complementary way. In particular, it can be used to simultaneously describe scenelevel, object-level and point-level contents in images. Extensive experiments are conducted on four image retrieval benchmarks, and the experimental results show that our CCS achieves state-of-the-art retrieval results.

Original languageEnglish
Title of host publicationMM 2016 - Proceedings of the 2016 ACM Multimedia Conference
PublisherAssociation for Computing Machinery, Inc
Pages407-411
Number of pages5
ISBN (Electronic)9781450336031
DOIs
Publication statusPublished - 1 Oct 2016
Externally publishedYes
Event24th ACM Multimedia Conference, MM 2016 - Amsterdam, United Kingdom
Duration: 15 Oct 201619 Oct 2016

Publication series

NameMM 2016 - Proceedings of the 2016 ACM Multimedia Conference

Conference

Conference24th ACM Multimedia Conference, MM 2016
Country/TerritoryUnited Kingdom
CityAmsterdam
Period15/10/1619/10/16

Keywords

  • CNN
  • Complementary CNN and SIFT (CCS)
  • Multi-level image representation
  • SIFT

Fingerprint

Dive into the research topics of 'CNN vs. SIFT for image retrieval: Alternative or complementary?'. Together they form a unique fingerprint.

Cite this