Lite-FPN for keypoint-based monocular 3D object detection

Lei Yang; Xinyu Zhang; Jun Li; Li Wang; Minghan Zhu; Lei Zhu

doi:10.1016/j.knosys.2023.110517

Lite-FPN for keypoint-based monocular 3D object detection

Lei Yang, Xinyu Zhang^*, Jun Li, Li Wang, Minghan Zhu, Lei Zhu

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

Abstract

3D object detection with a single image is an essential and challenging task for autonomous driving. Multi-scale feature fusion is effective for keypoint-based monocular 3D object detectors to boost performance within a large range of scales and distances. However, the existing FPN modules inevitably increase latency owing to the further extraction and merging operations on multi-scale feature maps. In this paper, we propose a lightweight feature pyramid network called Lite-FPN for keypoint-based monocular 3D object detectors that perform multi-scale feature fusion only at sparsely distributed keypoint locations. Besides, to alleviate the misalignment between classification score and localization precision, we propose an effective regression loss named attention loss, which assigns predictions with misaligned classification score and localization precision larger weights in the training stage. Extensive experiments based on several state-of-the-art keypoint-based detectors on the KITTI and nuScenes datasets show that our proposed methods manage to achieve significant accuracy improvements. Meanwhile, the enhanced SMOKE with our Lite-FPN module surpasses the baseline enhanced by the classic FPN over 19 FPS.

Original language	English
Article number	110517
Journal	Knowledge-Based Systems
Volume	271
DOIs	https://doi.org/10.1016/j.knosys.2023.110517
Publication status	Published - 8 Jul 2023
Externally published	Yes

Keywords

Autonomous driving
Lite-FPN
Monocular 3D object detection
Multi-scale feature fusion

Access to Document

10.1016/j.knosys.2023.110517

Cite this

Yang, L., Zhang, X., Li, J., Wang, L., Zhu, M., & Zhu, L. (2023). Lite-FPN for keypoint-based monocular 3D object detection. Knowledge-Based Systems, 271, Article 110517. https://doi.org/10.1016/j.knosys.2023.110517

@article{4395d9c0120a47118c0344d33f159558,

title = "Lite-FPN for keypoint-based monocular 3D object detection",

abstract = "3D object detection with a single image is an essential and challenging task for autonomous driving. Multi-scale feature fusion is effective for keypoint-based monocular 3D object detectors to boost performance within a large range of scales and distances. However, the existing FPN modules inevitably increase latency owing to the further extraction and merging operations on multi-scale feature maps. In this paper, we propose a lightweight feature pyramid network called Lite-FPN for keypoint-based monocular 3D object detectors that perform multi-scale feature fusion only at sparsely distributed keypoint locations. Besides, to alleviate the misalignment between classification score and localization precision, we propose an effective regression loss named attention loss, which assigns predictions with misaligned classification score and localization precision larger weights in the training stage. Extensive experiments based on several state-of-the-art keypoint-based detectors on the KITTI and nuScenes datasets show that our proposed methods manage to achieve significant accuracy improvements. Meanwhile, the enhanced SMOKE with our Lite-FPN module surpasses the baseline enhanced by the classic FPN over 19 FPS.",

keywords = "Autonomous driving, Lite-FPN, Monocular 3D object detection, Multi-scale feature fusion",

author = "Lei Yang and Xinyu Zhang and Jun Li and Li Wang and Minghan Zhu and Lei Zhu",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2023",

month = jul,

day = "8",

doi = "10.1016/j.knosys.2023.110517",

language = "English",

volume = "271",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Lite-FPN for keypoint-based monocular 3D object detection

AU - Yang, Lei

AU - Zhang, Xinyu

AU - Li, Jun

AU - Wang, Li

AU - Zhu, Minghan

AU - Zhu, Lei

PY - 2023/7/8

Y1 - 2023/7/8

N2 - 3D object detection with a single image is an essential and challenging task for autonomous driving. Multi-scale feature fusion is effective for keypoint-based monocular 3D object detectors to boost performance within a large range of scales and distances. However, the existing FPN modules inevitably increase latency owing to the further extraction and merging operations on multi-scale feature maps. In this paper, we propose a lightweight feature pyramid network called Lite-FPN for keypoint-based monocular 3D object detectors that perform multi-scale feature fusion only at sparsely distributed keypoint locations. Besides, to alleviate the misalignment between classification score and localization precision, we propose an effective regression loss named attention loss, which assigns predictions with misaligned classification score and localization precision larger weights in the training stage. Extensive experiments based on several state-of-the-art keypoint-based detectors on the KITTI and nuScenes datasets show that our proposed methods manage to achieve significant accuracy improvements. Meanwhile, the enhanced SMOKE with our Lite-FPN module surpasses the baseline enhanced by the classic FPN over 19 FPS.

AB - 3D object detection with a single image is an essential and challenging task for autonomous driving. Multi-scale feature fusion is effective for keypoint-based monocular 3D object detectors to boost performance within a large range of scales and distances. However, the existing FPN modules inevitably increase latency owing to the further extraction and merging operations on multi-scale feature maps. In this paper, we propose a lightweight feature pyramid network called Lite-FPN for keypoint-based monocular 3D object detectors that perform multi-scale feature fusion only at sparsely distributed keypoint locations. Besides, to alleviate the misalignment between classification score and localization precision, we propose an effective regression loss named attention loss, which assigns predictions with misaligned classification score and localization precision larger weights in the training stage. Extensive experiments based on several state-of-the-art keypoint-based detectors on the KITTI and nuScenes datasets show that our proposed methods manage to achieve significant accuracy improvements. Meanwhile, the enhanced SMOKE with our Lite-FPN module surpasses the baseline enhanced by the classic FPN over 19 FPS.

KW - Autonomous driving

KW - Lite-FPN

KW - Monocular 3D object detection

KW - Multi-scale feature fusion

UR - http://www.scopus.com/inward/record.url?scp=85153499342&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2023.110517

DO - 10.1016/j.knosys.2023.110517

M3 - Article

AN - SCOPUS:85153499342

SN - 0950-7051

VL - 271

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 110517

ER -

Lite-FPN for keypoint-based monocular 3D object detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this