Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving

Shangwei Guo; Jin Lu; Zhengchao Lai; Jun Li; Shaokun Han

doi:10.1109/ITAIC58329.2023.10408989

Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving

Shangwei Guo^*, Jin Lu, Zhengchao Lai, Jun Li, Shaokun Han^*

^*Corresponding author for this work

School of Optics and Photonics

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.

Original language	English
Title of host publication	IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference
Editors	Bing Xu, Kefen Mou
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1429-1433
Number of pages	5
ISBN (Electronic)	9798350333664
DOIs	https://doi.org/10.1109/ITAIC58329.2023.10408989
Publication status	Published - 2023
Event	11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023 - Chongqing, China Duration: 8 Dec 2023 → 10 Dec 2023

Publication series

Name	IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)
ISSN (Print)	2693-2865

Conference

Conference	11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023
Country/Territory	China
City	Chongqing
Period	8/12/23 → 10/12/23

Keywords

BEV Perception
Depth Prediction
Map Segmentation
Object Detection

Access to Document

10.1109/ITAIC58329.2023.10408989

Cite this

Guo, S., Lu, J., Lai, Z., Li, J., & Han, S. (2023). Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving. In B. Xu, & K. Mou (Eds.), IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (pp. 1429-1433). (IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ITAIC58329.2023.10408989

Guo, Shangwei ; Lu, Jin ; Lai, Zhengchao et al. / Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving. IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference. editor / Bing Xu ; Kefen Mou. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 1429-1433 (IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)).

@inproceedings{90e1c66b319c420a8db50badadeb6466,

title = "Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving",

abstract = "Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.",

keywords = "BEV Perception, Depth Prediction, Map Segmentation, Object Detection",

author = "Shangwei Guo and Jin Lu and Zhengchao Lai and Jun Li and Shaokun Han",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023 ; Conference date: 08-12-2023 Through 10-12-2023",

year = "2023",

doi = "10.1109/ITAIC58329.2023.10408989",

language = "English",

series = "IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1429--1433",

editor = "Bing Xu and Kefen Mou",

booktitle = "IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference",

address = "United States",

}

Guo, S, Lu, J, Lai, Z, Li, J & Han, S 2023, Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving. in B Xu & K Mou (eds), IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference. IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Institute of Electrical and Electronics Engineers Inc., pp. 1429-1433, 11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023, Chongqing, China, 8/12/23. https://doi.org/10.1109/ITAIC58329.2023.10408989

Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving. / Guo, Shangwei; Lu, Jin; Lai, Zhengchao et al.
IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference. ed. / Bing Xu; Kefen Mou. Institute of Electrical and Electronics Engineers Inc., 2023. p. 1429-1433 (IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving

AU - Guo, Shangwei

AU - Lu, Jin

AU - Lai, Zhengchao

AU - Li, Jun

AU - Han, Shaokun

PY - 2023

Y1 - 2023

N2 - Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.

AB - Vision-centric Bird's Eye View (BEV) perception, encompassing object detection and map segmentation, plays a pivotal role in providing crucial 3D environmental information for autonomous driving decisions. However, due to the inherent absence of depth information in 2D images, the conversion of perspective views to BEV poses challenges and hinders the performance of camera-based BEV perception in comparison to methods equipped with depth sensors. In this research paper, we propose an innovative approach that integrates depth estimation into camera-based BEV perception. By employing a depth estimation network, the method enhances the transformation of 2D-3D features. Specifically, our method consists of a depth estimation branch and a BEV perception branch. The input image is fed into the shared image encoder to extract multi-scale features. In the depth estimation branch, these features are utilized to generate a depth map through the depth decoder, which, in combination with sequential images and relative pose information, forms the basis for reprojection photometric error, guiding and supervising the branch. To address the challenge of scale ambiguity in monocular depth estimation, we incorporate ground-truth trajectory information collected by an IMU to constrain the predicted depth values, ensuring that the predicted depth is scale-aware. In the BEV perception branch, the afore-mentioned multi-scale features are projected into 3D space along the perspective rays, with the assistance of depth information derived from the depth estimation branch. Subsequently, the 3D features are collapsed along the vertical axis to generate BEV features, which are further input into a task-specific head after feature extraction. Experimental results on the nuScenes dataset demonstrate that our proposed method effectively enhances the performance of BEV-based object detection and map semantic segmentation by 2.8 % and 2.2 %, respectively.

KW - BEV Perception

KW - Depth Prediction

KW - Map Segmentation

KW - Object Detection

UR - http://www.scopus.com/inward/record.url?scp=85186088866&partnerID=8YFLogxK

U2 - 10.1109/ITAIC58329.2023.10408989

DO - 10.1109/ITAIC58329.2023.10408989

M3 - Conference contribution

AN - SCOPUS:85186088866

T3 - IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)

SP - 1429

EP - 1433

BT - IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference

A2 - Xu, Bing

A2 - Mou, Kefen

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 11th Joint International Information Technology and Artificial Intelligence Conference, ITAIC 2023

Y2 - 8 December 2023 through 10 December 2023

ER -

Guo S, Lu J, Lai Z, Li J, Han S. Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving. In Xu B, Mou K, editors, IEEE ITAIC 2023 - IEEE 11th Joint International Information Technology and Artificial Intelligence Conference. Institute of Electrical and Electronics Engineers Inc. 2023. p. 1429-1433. (IEEE Joint International Information Technology and Artificial Intelligence Conference (ITAIC)). doi: 10.1109/ITAIC58329.2023.10408989

Depth-Assisted Camera-Based Bird's Eye View Perception for Autonomous Driving

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this