TY - GEN
T1 - Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving
AU - Guo, Shangwei
AU - Lu, Jin
AU - Lai, Zhengchao
AU - Li, Jun
AU - Han, Shaokun
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.
AB - Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.
KW - BEV Perception
KW - Depth Prediction
KW - Map Segmentation
KW - Object Detection
UR - http://www.scopus.com/inward/record.url?scp=85186088866&partnerID=8YFLogxK
U2 - 10.1109/ITAIC58329.2023.10408989
DO - 10.1109/ITAIC58329.2023.10408989
M3 - Conference contribution
AN - SCOPUS:85186088866
T3 - IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)
SP - 1429
EP - 1433
BT - IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference
A2 - Xu, Bing
A2 - Mou, Kefen
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023
Y2 - 8 December 2023 through 10 December 2023
ER -