Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object Detection

Shihao Wang, Xiaohui Jiang, Ying Li*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The dominant multi-camera 3D detection paradigm is based on explicit 3D feature construction, which requires complicated indexing of local image-view features via 3D-to-2D projection. Other methods implicitly introduce geometric positional encoding and perform global attention (e.g., PETR) to build the relationship between image tokens and 3D objects. The 3D-to-2D perspective inconsistency and global attention lead to a weak correlation between foreground tokens and queries, resulting in slow convergence. We propose Focal-PETR with instance-guided supervision and spatial alignment module to adaptively focus object queries on discriminative foreground regions. Focal-PETR additionally introduces a down-sampling strategy to reduce the consumption of global attention. Our model achieves leading performance on the large-scale nuScenes benchmark and a superior speed of 30 FPS on a single RTX3090 GPU. Extensive experiments show that our method outperforms PETR while consuming 3x fewer training hours. The code is made publicly available.

Original languageEnglish
Pages (from-to)1481-1489
Number of pages9
JournalIEEE Transactions on Intelligent Vehicles
Volume9
Issue number1
DOIs
Publication statusPublished - 1 Jan 2024

Keywords

  • 3D Object Detection
  • Autonomous Driving
  • Detection Transformer

Fingerprint

Dive into the research topics of 'Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object Detection'. Together they form a unique fingerprint.

Cite this