BEVHeight++: Toward Robust Visual Centric 3D Object Detection

  • Lei Yang
  • , Tao Tang
  • , Jun Li
  • , Kun Yuan
  • , Kai Wu
  • , Peng Chen
  • , Li Wang
  • , Yi Huang
  • , Lei Li
  • , Xinyu Zhang*
  • , Kaicheng Yu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric detection methods perform poorly on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight++, to address this issue. In essence, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. By incorporating both height and depth encoding techniques, we achieve a more accurate and robust projection from 2D to BEV spaces. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. In terms of the ego-vehicle scenario, BEVHeight++ surpasses depth-only methods with increases of +2.8% NDS and +1.7% mAP on the nuScenes test set, and even higher gains of +9.3% NDS and +8.8% mAP on the nuScenes-C benchmark with object-level distortion. Consistent and substantial performance improvements are achieved across the KITTI, KITTI-360, and Waymo datasets as well.

Original languageEnglish
Pages (from-to)5094-5111
Number of pages18
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume47
Issue number6
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • 3D object detection
  • Autonomous driving
  • robustness
  • vision-centric perception

Fingerprint

Dive into the research topics of 'BEVHeight++: Toward Robust Visual Centric 3D Object Detection'. Together they form a unique fingerprint.

Cite this