BEV-SI: A Lightweight and Efficient Split-Inception Framework for Multi-modal 3D Object Detection

  • Yifan Wu
  • , Hongwen He*
  • , Yingjuan Tang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Fusing LiDAR and camera data for 3D object detection remains a key challenge in autonomous driving. While most methods adopt dual-branch frameworks to extract BEV features from both modalities before fusion, progress in point cloud feature extraction still lags behind that of image-based networks, limiting overall fusion effectiveness. To address this gap, we propose BEV-SI, a novel multi-modal detection framework featuring a lightweight yet expressive LiDAR branch. At its core is the Split-Inception Block, which enhances point cloud representation by applying diverse channel-wise operations and expanding the receptive field. Furthermore, we introduce the Split-Neck module, which performs efficient multi-scale feature fusion through adaptive downsampling and Branch Attention, allowing the network to dynamically reweight spatial features across different scales. Extensive experiments on the nuScenes benchmark demonstrate that BEV-SI achieves competitive accuracy with significantly improved inference speed.

Original languageEnglish
Title of host publicationIntelligent Vehicles - 3rd CCF Intelligent Vehicles Symposium, CIVS 2025, Revised Selected Papers
EditorsHuiyun Li, Zhongli Wang, Shuai Zhao, Peng Sun, Michael Herrmann, Xi Zheng, Yuling Liu
PublisherSpringer Science and Business Media Deutschland GmbH
Pages160-171
Number of pages12
ISBN (Print)9789819548743
DOIs
Publication statusPublished - 2026
Externally publishedYes
Event3rd CCF Intelligent Vehicles Symposium, CIVS 2025 - Hangzhou, China
Duration: 16 Aug 202518 Aug 2025

Publication series

NameCommunications in Computer and Information Science
Volume2631 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference3rd CCF Intelligent Vehicles Symposium, CIVS 2025
Country/TerritoryChina
CityHangzhou
Period16/08/2518/08/25

Keywords

  • 3D object detection
  • Autonomous driving
  • Multi-modal fusion
  • Multi-scale fusion

Fingerprint

Dive into the research topics of 'BEV-SI: A Lightweight and Efficient Split-Inception Framework for Multi-modal 3D Object Detection'. Together they form a unique fingerprint.

Cite this