Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving

Shangwei Guo*, Jin Lu, Zhengchao Lai, Jun Li, Shaokun Han*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.

Original languageEnglish
Title of host publicationIEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference
EditorsBing Xu, Kefen Mou
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1429-1433
Number of pages5
ISBN (Electronic)9798350333664
DOIs
Publication statusPublished - 2023
Event11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023 - Chongqing, China
Duration: 8 Dec 202310 Dec 2023

Publication series

NameIEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)
ISSN (Print)2693-2865

Conference

Conference11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023
Country/TerritoryChina
CityChongqing
Period8/12/2310/12/23

Keywords

  • BEV Perception
  • Depth Prediction
  • Map Segmentation
  • Object Detection

Fingerprint

Dive into the research topics of 'Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving'. Together they form a unique fingerprint.

Cite this