TY - GEN
T1 - LCNet
T2 - 6th International Conference on Digital Signal Processing, ICDSP 2022
AU - Yi, Xin
AU - Ma, Bo
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/2/25
Y1 - 2022/2/25
N2 - Object detection is a widely studied task in the computer vision field. In recent years, some milestone approaches and solid benchmarks have been proposed, which significantly boosts the development of related researches. The previous object detection methods follow a paradigm: the classification head and the regression head share the same feature extracted by the backbone network. In this paper, we revisit this paradigm for two-stage detectors and prove that the regression head can achieve better results by using the local features. In our proposed Location Combination Networks (LCNet), we extract the effective region of the feature in a Laplace way, and we introduce auxiliary confidence gain loss, Intersection over Union (IoU) gain loss, and distribution loss to guide its convergence. In the classification head, we combine these local features into the global feature for better classification. In the regression head, by ranking these effective regions in the spatial dimension, we can select the local features closest to each foreground boundary and use the selected features to predict the offset of each foreground boundary. Finally, we combine the locations of the four boundaries to obtain the final bounding box prediction. Extensive experimental results on the MS COCO benchmark validate the effectiveness of our proposed method.
AB - Object detection is a widely studied task in the computer vision field. In recent years, some milestone approaches and solid benchmarks have been proposed, which significantly boosts the development of related researches. The previous object detection methods follow a paradigm: the classification head and the regression head share the same feature extracted by the backbone network. In this paper, we revisit this paradigm for two-stage detectors and prove that the regression head can achieve better results by using the local features. In our proposed Location Combination Networks (LCNet), we extract the effective region of the feature in a Laplace way, and we introduce auxiliary confidence gain loss, Intersection over Union (IoU) gain loss, and distribution loss to guide its convergence. In the classification head, we combine these local features into the global feature for better classification. In the regression head, by ranking these effective regions in the spatial dimension, we can select the local features closest to each foreground boundary and use the selected features to predict the offset of each foreground boundary. Finally, we combine the locations of the four boundaries to obtain the final bounding box prediction. Extensive experimental results on the MS COCO benchmark validate the effectiveness of our proposed method.
KW - Deep learning
KW - Local feature mining
KW - Location combination
KW - Object detection
UR - http://www.scopus.com/inward/record.url?scp=85133815244&partnerID=8YFLogxK
U2 - 10.1145/3529570.3529596
DO - 10.1145/3529570.3529596
M3 - Conference contribution
AN - SCOPUS:85133815244
T3 - ACM International Conference Proceeding Series
SP - 152
EP - 158
BT - ICDSP 2022 - 2022 6th International Conference on Digital Signal Processing
PB - Association for Computing Machinery
Y2 - 25 February 2022 through 27 February 2022
ER -