跳到主要导航 跳到搜索 跳到主要内容

PLPFusion: Plane-Line-Pixel Fully Sparse Fusion for Robust Multi-Modal 3D Object Detection

  • Beijing Institute of Technology
  • Research Institute of China Ordnance Industries

科研成果: 期刊稿件文章同行评审

摘要

Fully sparse fusion makes an excellent balance between efficiency and accuracy in multi-modal 3D object detection. However, most existing methods focus on foreground objects while overlooking background context. This oversight compromises detection robustness, especially for occluded or small-sized objects, leading to suboptimal detection performance. To address this limitation, we propose a novel fully sparse fusion framework (PLPFusion), which introduces a hierarchical Plane-Line-Pixel representation to progressively model the object-context relationships. PLPFusion comprises three key modules: the Plane Enhancement Module (PEM), the Line Alignment Module (LAM) and the Pixel-Level Aggregation Module (PLAM). Firstly, PEM utilizes geometric cues from LiDAR feature planes to generate spatially-aware object queries. Secondly, LAM further refines these queries with geometric priors for semantic awareness. Lastly, PLAM aggregates pixel-level context to enhance discriminative completeness by leveraging the semantically-aware object queries. On the nuScenes benchmark, PLPFusion achieves 71.9% mAP and 74.0% NDS, outperforming the baseline method FUTR3D by +2.5% mAP and +1.9% NDS, respectively. On the KITTI benchmark, it achieves 72.68% BEV mAP and 67.39% 3D mAP. These results confirm its robustness and effectiveness in diverse multi-modal 3D scenarios.

指纹

探究 'PLPFusion: Plane-Line-Pixel Fully Sparse Fusion for Robust Multi-Modal 3D Object Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此