Skip to main navigation Skip to search Skip to main content

OverlapOcc: Leveraging overlap regions of surround-view cameras for 3D semantic occupancy prediction

  • Shangwei Guo
  • , Jun Li
  • , Shaokun Han*
  • *Corresponding author for this work
  • Beijing Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

The perception of accurate geometric and semantic information from surround-view cameras is of great significance in the field of autonomous driving and represents a prominent area of current research. In the domain of 3D semantic occupancy prediction, the surround-view camera image is commonly treated as an independent frame, and attempts are made to address the ill-posed problem of 2D to 3D features transformation. Our key insight lies in the utilization of overlap regions captured by the surround-view cameras, which provide valuable disparity information that aids in obtaining accurate geometric details. Consequently, this contributes to the accurate prediction of 3D semantic occupancy. In this paper, OverlapOcc, a novel framework for 3D semantic occupancy prediction, is introduced. It leverages the latent geometric constraints provided by the overlap regions of surround-view cameras. Specifically, multi-level features are extracted from each surround-view camera using an image-backbone. Subsequently, an Overlap-Image-Cross-Attention (OICA) layer based on deformable transformer is proposed to transform the multi-level image features into 3D voxel space, thereby facilitating the 2D-3D transformation. The OICA module incorporates an Overlap-Attention (OA) module to exploit the geometric prior information derived from the disparity in the overlap region. Furthermore, a Spatial-Self-Attention (SSA) layer based on deformable transformer is employed to propagate the accurate geometric information from the overlap region to the global context by learning local and contextual features. The proposed OverlapOcc framework comprises stacked OICA and SSA layers, wherein the accurate geometric information extracted by OICA from the overlap region is disseminated to the global context by SSA. Consequently, a accurate depiction of the overall 3D scene is achieved. Extensive experiments are conducted on the nuScenes dataset, and the results validate the state-of-the-art performance of OverlapOcc in vision-based LiDAR semantic segmentation and 3D semantic occupancy prediction task. It even demonstrates comparable performance to some LiDAR-based segmentation methods.

Original languageEnglish
Article number128701
JournalExpert Systems with Applications
Volume293
DOIs
Publication statusPublished - 1 Dec 2025
Externally publishedYes

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • 3D semantic occupancy prediction
  • Deep learning
  • LiDAR semantic segmentation
  • Surround-view camera
  • Transformer

Fingerprint

Dive into the research topics of 'OverlapOcc: Leveraging overlap regions of surround-view cameras for 3D semantic occupancy prediction'. Together they form a unique fingerprint.

Cite this