TY - JOUR
T1 - Lite-FPN for keypoint-based monocular 3D object detection
AU - Yang, Lei
AU - Zhang, Xinyu
AU - Li, Jun
AU - Wang, Li
AU - Zhu, Minghan
AU - Zhu, Lei
N1 - Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/7/8
Y1 - 2023/7/8
N2 - 3D object detection with a single image is an essential and challenging task for autonomous driving. Multi-scale feature fusion is effective for keypoint-based monocular 3D object detectors to boost performance within a large range of scales and distances. However, the existing FPN modules inevitably increase latency owing to the further extraction and merging operations on multi-scale feature maps. In this paper, we propose a lightweight feature pyramid network called Lite-FPN for keypoint-based monocular 3D object detectors that perform multi-scale feature fusion only at sparsely distributed keypoint locations. Besides, to alleviate the misalignment between classification score and localization precision, we propose an effective regression loss named attention loss, which assigns predictions with misaligned classification score and localization precision larger weights in the training stage. Extensive experiments based on several state-of-the-art keypoint-based detectors on the KITTI and nuScenes datasets show that our proposed methods manage to achieve significant accuracy improvements. Meanwhile, the enhanced SMOKE with our Lite-FPN module surpasses the baseline enhanced by the classic FPN over 19 FPS.
AB - 3D object detection with a single image is an essential and challenging task for autonomous driving. Multi-scale feature fusion is effective for keypoint-based monocular 3D object detectors to boost performance within a large range of scales and distances. However, the existing FPN modules inevitably increase latency owing to the further extraction and merging operations on multi-scale feature maps. In this paper, we propose a lightweight feature pyramid network called Lite-FPN for keypoint-based monocular 3D object detectors that perform multi-scale feature fusion only at sparsely distributed keypoint locations. Besides, to alleviate the misalignment between classification score and localization precision, we propose an effective regression loss named attention loss, which assigns predictions with misaligned classification score and localization precision larger weights in the training stage. Extensive experiments based on several state-of-the-art keypoint-based detectors on the KITTI and nuScenes datasets show that our proposed methods manage to achieve significant accuracy improvements. Meanwhile, the enhanced SMOKE with our Lite-FPN module surpasses the baseline enhanced by the classic FPN over 19 FPS.
KW - Autonomous driving
KW - Lite-FPN
KW - Monocular 3D object detection
KW - Multi-scale feature fusion
UR - http://www.scopus.com/inward/record.url?scp=85153499342&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110517
DO - 10.1016/j.knosys.2023.110517
M3 - Article
AN - SCOPUS:85153499342
SN - 0950-7051
VL - 271
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110517
ER -