Dynamically visual disambiguation of keyword-based image search

  • Yazhou Yao
  • , Zeren Sun
  • , Fumin Shen*
  • , Li Liu
  • , Limin Wang
  • , Fan Zhu
  • , Lizhong Ding
  • , Gangshan Wu
  • , Ling Shao
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits their performance is the problem of visual polysemy. To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation. Compared to existing methods, the primary advantage of our approach lies in that our approach can adapt to the dynamic changes in the search results. Our proposed framework consists of two major steps: we first discover and dynamically select the text queries according to the image search results, then we employ the proposed saliency-guided deep multi-instance learning network to remove outliers and learn classification models for visual disambiguation. Extensive experiments demonstrate the superiority of our proposed approach.

Original languageEnglish
Title of host publicationProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
EditorsSarit Kraus
PublisherInternational Joint Conferences on Artificial Intelligence
Pages996-1002
Number of pages7
ISBN (Electronic)9780999241141
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China
Duration: 10 Aug 201916 Aug 2019

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2019-August
ISSN (Print)1045-0823

Conference

Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Country/TerritoryChina
CityMacao
Period10/08/1916/08/19

Fingerprint

Dive into the research topics of 'Dynamically visual disambiguation of keyword-based image search'. Together they form a unique fingerprint.

Cite this