TY - GEN
T1 - Rethinking the Localization in Weakly Supervised Object Localization
AU - Xu, Rui
AU - Luo, Yong
AU - Hu, Han
AU - Du, Bo
AU - Shen, Jialie
AU - Wen, Yonggang
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/10/26
Y1 - 2023/10/26
N2 - Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision. This task is to localize the objects in the images given only the image-level supervision. Recently, dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task. However, existing solutions under this pipeline usually suffer from the following drawbacks: 1) they are not flexible since they can only localize one object for each image due to the adopted single-class regression (SCR) for localization; 2) the generated pseudo bounding boxes may be noisy, but the negative impact of such noise is not well addressed. To remedy these drawbacks, we first propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background. Then we design a weighted entropy (WE) loss using the unlabeled data to reduce the negative impact of noisy bounding boxes. Extensive experiments on the popular CUB-200-2011 and ImageNet-1K datasets demonstrate the effectiveness of our method.
AB - Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision. This task is to localize the objects in the images given only the image-level supervision. Recently, dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task. However, existing solutions under this pipeline usually suffer from the following drawbacks: 1) they are not flexible since they can only localize one object for each image due to the adopted single-class regression (SCR) for localization; 2) the generated pseudo bounding boxes may be noisy, but the negative impact of such noise is not well addressed. To remedy these drawbacks, we first propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background. Then we design a weighted entropy (WE) loss using the unlabeled data to reduce the negative impact of noisy bounding boxes. Extensive experiments on the popular CUB-200-2011 and ImageNet-1K datasets demonstrate the effectiveness of our method.
KW - binary-class detector
KW - noisy label
KW - object localization
KW - weakly supervised
KW - weighted entropy
UR - http://www.scopus.com/inward/record.url?scp=85179558481&partnerID=8YFLogxK
U2 - 10.1145/3581783.3611959
DO - 10.1145/3581783.3611959
M3 - Conference contribution
AN - SCOPUS:85179558481
T3 - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
SP - 5484
EP - 5494
BT - MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
T2 - 31st ACM International Conference on Multimedia, MM 2023
Y2 - 29 October 2023 through 3 November 2023
ER -