TY - GEN
T1 - LocLoc
T2 - 31st ACM International Conference on Multimedia, MM 2023
AU - Cao, Xinzi
AU - Zheng, Xiawu
AU - Shen, Yunhang
AU - Li, Ke
AU - Chen, Jie
AU - Lu, Yutong
AU - Tian, Yonghong
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/10/27
Y1 - 2023/10/27
N2 - Weakly Supervised Object Localization (WSOL) aims to localize objects using only image-level labels while ensuring competitive classification performance. However, previous efforts have prioritized localization over classification accuracy in discriminative features, in which low-level information is neglected. We argue that low-level image representations, such as edges, color, texture, and motions are crucial for accurate detection. That is, using such information further achieves more refined localization, which can be used to promote classification accuracy. In this paper, we propose a unified framework that simultaneously improves localization and classification accuracy, termed as LocLoc (Low-level Cues and Local-area Guides). It leverages low-level image cues to explore global and local representations for accurate localization and classification. Specifically, we introduce a GrabCut-Enhanced Generator (GEG) to learn global semantic representations for localization based on graph cuts to enhance low-level information based on long-range dependencies captured by the transformer. We further design a Local Feature Digging Module (LFDM) that utilizes low-level cues to guide the learning route of local feature representations for accurate classification. Extensive experiments demonstrate the effectiveness of LocLoc with 84.4%(↑5.2%) Top-1 Loc., 85.8% Top-1 Cls. on CUB-200-2011 and 57.6% (↑1.5%) Top-1 Loc., 78.6% Top-1Cls. on ILSVRC 2012, indicating that our method achieves competitive performance with a large margin compared to previous approaches. Code and models are available at https://github.com/Cliffia123/LocLoc.
AB - Weakly Supervised Object Localization (WSOL) aims to localize objects using only image-level labels while ensuring competitive classification performance. However, previous efforts have prioritized localization over classification accuracy in discriminative features, in which low-level information is neglected. We argue that low-level image representations, such as edges, color, texture, and motions are crucial for accurate detection. That is, using such information further achieves more refined localization, which can be used to promote classification accuracy. In this paper, we propose a unified framework that simultaneously improves localization and classification accuracy, termed as LocLoc (Low-level Cues and Local-area Guides). It leverages low-level image cues to explore global and local representations for accurate localization and classification. Specifically, we introduce a GrabCut-Enhanced Generator (GEG) to learn global semantic representations for localization based on graph cuts to enhance low-level information based on long-range dependencies captured by the transformer. We further design a Local Feature Digging Module (LFDM) that utilizes low-level cues to guide the learning route of local feature representations for accurate classification. Extensive experiments demonstrate the effectiveness of LocLoc with 84.4%(↑5.2%) Top-1 Loc., 85.8% Top-1 Cls. on CUB-200-2011 and 57.6% (↑1.5%) Top-1 Loc., 78.6% Top-1Cls. on ILSVRC 2012, indicating that our method achieves competitive performance with a large margin compared to previous approaches. Code and models are available at https://github.com/Cliffia123/LocLoc.
KW - low-level cues
KW - transformer
KW - weakly supervised object localization
UR - https://www.scopus.com/pages/publications/85179549020
U2 - 10.1145/3581783.3612165
DO - 10.1145/3581783.3612165
M3 - Conference contribution
AN - SCOPUS:85179549020
T3 - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
SP - 5655
EP - 5664
BT - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 29 October 2023 through 3 November 2023
ER -