CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception

Bin Li, Yanan Zhao, Huachun Tan*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Vehicle–infrastructure cooperative perception is becoming increasingly crucial for autonomous driving systems and involves leveraging infrastructure’s broader spatial perspective and computational resources. This paper introduces CoFormerNet, which is a novel framework for improving cooperative perception. CoFormerNet employs a consistent structure for both vehicle and infrastructure branches, integrating the temporal aggregation module and spatial-modulated cross-attention to fuse intermediate features at two distinct stages. This design effectively handles communication delays and spatial misalignment. Experimental results using the DAIR-V2X and V2XSet datasets demonstrated that CoFormerNet significantly outperformed the existing methods, achieving state-of-the-art performance in 3D object detection.

Original languageEnglish
Article number4101
JournalSensors
Volume24
Issue number13
DOIs
Publication statusPublished - Jul 2024

Keywords

  • 3D LiDAR object detection
  • V2X
  • cooperative perception

Fingerprint

Dive into the research topics of 'CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception'. Together they form a unique fingerprint.

Cite this