Abstract
Vehicle–infrastructure cooperative perception is becoming increasingly crucial for autonomous driving systems and involves leveraging infrastructure’s broader spatial perspective and computational resources. This paper introduces CoFormerNet, which is a novel framework for improving cooperative perception. CoFormerNet employs a consistent structure for both vehicle and infrastructure branches, integrating the temporal aggregation module and spatial-modulated cross-attention to fuse intermediate features at two distinct stages. This design effectively handles communication delays and spatial misalignment. Experimental results using the DAIR-V2X and V2XSet datasets demonstrated that CoFormerNet significantly outperformed the existing methods, achieving state-of-the-art performance in 3D object detection.
Original language | English |
---|---|
Article number | 4101 |
Journal | Sensors |
Volume | 24 |
Issue number | 13 |
DOIs | |
Publication status | Published - Jul 2024 |
Keywords
- 3D LiDAR object detection
- V2X
- cooperative perception