Abstract
The perception of accurate geometric and semantic information from surround-view cameras is of great significance in the field of autonomous driving and represents a prominent area of current research. In the domain of 3D semantic occupancy prediction, the surround-view camera image is commonly treated as an independent frame, and attempts are made to address the ill-posed problem of 2D to 3D features transformation. Our key insight lies in the utilization of overlap regions captured by the surround-view cameras, which provide valuable disparity information that aids in obtaining accurate geometric details. Consequently, this contributes to the accurate prediction of 3D semantic occupancy. In this paper, OverlapOcc, a novel framework for 3D semantic occupancy prediction, is introduced. It leverages the latent geometric constraints provided by the overlap regions of surround-view cameras. Specifically, multi-level features are extracted from each surround-view camera using an image-backbone. Subsequently, an Overlap-Image-Cross-Attention (OICA) layer based on deformable transformer is proposed to transform the multi-level image features into 3D voxel space, thereby facilitating the 2D-3D transformation. The OICA module incorporates an Overlap-Attention (OA) module to exploit the geometric prior information derived from the disparity in the overlap region. Furthermore, a Spatial-Self-Attention (SSA) layer based on deformable transformer is employed to propagate the accurate geometric information from the overlap region to the global context by learning local and contextual features. The proposed OverlapOcc framework comprises stacked OICA and SSA layers, wherein the accurate geometric information extracted by OICA from the overlap region is disseminated to the global context by SSA. Consequently, a accurate depiction of the overall 3D scene is achieved. Extensive experiments are conducted on the nuScenes dataset, and the results validate the state-of-the-art performance of OverlapOcc in vision-based LiDAR semantic segmentation and 3D semantic occupancy prediction task. It even demonstrates comparable performance to some LiDAR-based segmentation methods.
| Original language | English |
|---|---|
| Article number | 128701 |
| Journal | Expert Systems with Applications |
| Volume | 293 |
| DOIs | |
| Publication status | Published - 1 Dec 2025 |
| Externally published | Yes |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- 3D semantic occupancy prediction
- Deep learning
- LiDAR semantic segmentation
- Surround-view camera
- Transformer
Fingerprint
Dive into the research topics of 'OverlapOcc: Leveraging overlap regions of surround-view cameras for 3D semantic occupancy prediction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver