Skip to main navigation Skip to search Skip to main content

3D Occupancy Perception Network Based on Temporal Fusion of Bird's-Eye-View Features

  • Shaobin Wu*
  • , Yixuan Li
  • , Yunfeng Chu
  • , Xuze Lin
  • , Sheng Tan
  • , Xiaoan Li
  • *Corresponding author for this work
  • Beijing Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In view of the difficulties of long-tailed obstacle perception and the high complexity of dynamic environment modeling in unmanned driving scenarios, this paper proposes a 3D occupancy perception network based on temporal fusion of bird's-eye-view (BEV) features. Firstly, image features are extracted by image backbone network and mapped into BEV features, and then BEV features are temporally fused by deformable attention mechanism. Secondly, a dual-branch prediction structure of 3D semantic occupancy and 2D velocity flow field is designed to decouple the heterogeneous tasks. It realizes fine-grained voxel semantic occupancy prediction through 3D convolution, and generates 2D velocity flow field combined with temporal cost volume matching mechanism, so as to reduce multi-task competition while maintaining real-time performance. Finally, a dynamic supervision strategy is proposed, which uses ray-extended sampling to generate key voxel supervision masks covering obstacles and surrounding empty voxels, and combines random sampling of empty voxels with differentiated supervision of dynamic and static voxels to alleviate the imbalance of category distribution and suppress the trailing effect of prediction. Experiments on the FlowOcc3D dataset and real vehicle show that the proposed network achieves good performance in both semantic occupancy and velocity prediction, which verifies its effectiveness in various driving scenarios. Its lightweight design provides reliable support for real-time environment perception and path planning of unmanned systems, and promotes the application of 3D occupancy perception technology in various scenarios.

Original languageEnglish
Title of host publicationProceedings of 2025 IEEE International Conference on Unmanned Systems, ICUS 2025
EditorsRong Song
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages686-694
Number of pages9
ISBN (Electronic)9798331526726
DOIs
Publication statusPublished - 2025
Event2025 IEEE International Conference on Unmanned Systems, ICUS 2025 - Changzhou, China
Duration: 18 Sept 202519 Sept 2025

Publication series

NameProceedings of 2025 IEEE International Conference on Unmanned Systems, ICUS 2025

Conference

Conference2025 IEEE International Conference on Unmanned Systems, ICUS 2025
Country/TerritoryChina
CityChangzhou
Period18/09/2519/09/25

Keywords

  • 3D occupancy
  • environment perception
  • temporal fusion
  • unmanned driving

Fingerprint

Dive into the research topics of '3D Occupancy Perception Network Based on Temporal Fusion of Bird's-Eye-View Features'. Together they form a unique fingerprint.

Cite this