Abstract
Mamba, based on the state space model (SSM), offers an efficient alternative to the quadratic complexity of attention, showing promise for long-sequence data processing and global modeling in 3D object detection. However, applying it to this domain presents specific challenges: traditional serialization methods can compromise the spatial structure of 3D data, and the standard single-layer SSM design may limit cross-layer feature extraction. To address these issues, this paper proposes MSHI-Mamba, a Mamba-based multi-stage hierarchical interaction architecture for 3D backbone networks. We introduce a cross-layer complementary cross-attention module (C3AM) to mitigate feature redundancy in cross-layer encoding, as well as a bi-shift scanning strategy (BSS) that uses hybrid space-filling curves with shift scanning to better preserve spatial continuity and expand the receptive field during serialization. We also develop a voxel densifying downsampling module (VD-DS) to enhance local spatial information and foreground feature density. Experimental results obtained on the KITTI and nuScenes datasets demonstrate that our approach achieves competitive performance, with a 4.2% improvement in the mAP on KITTI, validating the effectiveness of the proposed components.
| Original language | English |
|---|---|
| Article number | 1189 |
| Journal | Applied Sciences (Switzerland) |
| Volume | 16 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - Feb 2026 |
| Externally published | Yes |
Keywords
- 3D object detection
- Mamba
- autonomous driving
- cross-attention mechanism
- space-filling curve
- state space model
Fingerprint
Dive into the research topics of 'MSHI-Mamba: A Multi-Stage Hierarchical Interaction Model for 3D Point Clouds Based on Mamba'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver