LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization

  • Xinzi Cao
  • , Xiawu Zheng
  • , Yunhang Shen
  • , Ke Li
  • , Jie Chen
  • , Yutong Lu*
  • , Yonghong Tian*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Weakly Supervised Object Localization (WSOL) aims to localize objects using only image-level labels while ensuring competitive classification performance. However, previous efforts have prioritized localization over classification accuracy in discriminative features, in which low-level information is neglected. We argue that low-level image representations, such as edges, color, texture, and motions are crucial for accurate detection. That is, using such information further achieves more refined localization, which can be used to promote classification accuracy. In this paper, we propose a unified framework that simultaneously improves localization and classification accuracy, termed as LocLoc (Low-level Cues and Local-area Guides). It leverages low-level image cues to explore global and local representations for accurate localization and classification. Specifically, we introduce a GrabCut-Enhanced Generator (GEG) to learn global semantic representations for localization based on graph cuts to enhance low-level information based on long-range dependencies captured by the transformer. We further design a Local Feature Digging Module (LFDM) that utilizes low-level cues to guide the learning route of local feature representations for accurate classification. Extensive experiments demonstrate the effectiveness of LocLoc with 84.4%(↑5.2%) Top-1 Loc., 85.8% Top-1 Cls. on CUB-200-2011 and 57.6% (↑1.5%) Top-1 Loc., 78.6% Top-1Cls. on ILSVRC 2012, indicating that our method achieves competitive performance with a large margin compared to previous approaches. Code and models are available at https://github.com/Cliffia123/LocLoc.

Original languageEnglish
Title of host publicationMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages5655-5664
Number of pages10
ISBN (Electronic)9798400701085
DOIs
Publication statusPublished - 27 Oct 2023
Externally publishedYes
Event31st ACM International Conference on Multimedia, MM 2023 - Ottawa, Canada
Duration: 29 Oct 20233 Nov 2023

Publication series

NameMM 2023 - Proceedings of the 31st ACM International Conference on Multimedia

Conference

Conference31st ACM International Conference on Multimedia, MM 2023
Country/TerritoryCanada
CityOttawa
Period29/10/233/11/23

Keywords

  • low-level cues
  • transformer
  • weakly supervised object localization

Fingerprint

Dive into the research topics of 'LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization'. Together they form a unique fingerprint.

Cite this