CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception

Bin Li; Yanan Zhao; Huachun Tan

doi:10.3390/s24134101

CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception

Bin Li, Yanan Zhao, Huachun Tan^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

Vehicle–infrastructure cooperative perception is becoming increasingly crucial for autonomous driving systems and involves leveraging infrastructure’s broader spatial perspective and computational resources. This paper introduces CoFormerNet, which is a novel framework for improving cooperative perception. CoFormerNet employs a consistent structure for both vehicle and infrastructure branches, integrating the temporal aggregation module and spatial-modulated cross-attention to fuse intermediate features at two distinct stages. This design effectively handles communication delays and spatial misalignment. Experimental results using the DAIR-V2X and V2XSet datasets demonstrated that CoFormerNet significantly outperformed the existing methods, achieving state-of-the-art performance in 3D object detection.

Original language	English
Article number	4101
Journal	Sensors
Volume	24
Issue number	13
DOIs	https://doi.org/10.3390/s24134101
Publication status	Published - Jul 2024

Keywords

3D LiDAR object detection
V2X
cooperative perception

Access to Document

10.3390/s24134101

Cite this

@article{fa58dd6c1c894e9e9029a5245f8144d1,

title = "CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception",

abstract = "Vehicle–infrastructure cooperative perception is becoming increasingly crucial for autonomous driving systems and involves leveraging infrastructure{\textquoteright}s broader spatial perspective and computational resources. This paper introduces CoFormerNet, which is a novel framework for improving cooperative perception. CoFormerNet employs a consistent structure for both vehicle and infrastructure branches, integrating the temporal aggregation module and spatial-modulated cross-attention to fuse intermediate features at two distinct stages. This design effectively handles communication delays and spatial misalignment. Experimental results using the DAIR-V2X and V2XSet datasets demonstrated that CoFormerNet significantly outperformed the existing methods, achieving state-of-the-art performance in 3D object detection.",

keywords = "3D LiDAR object detection, V2X, cooperative perception",

author = "Bin Li and Yanan Zhao and Huachun Tan",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = jul,

doi = "10.3390/s24134101",

language = "English",

volume = "24",

journal = "Sensors",

issn = "1424-8220",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "13",

}

TY - JOUR

T1 - CoFormerNet

T2 - A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception

AU - Li, Bin

AU - Zhao, Yanan

AU - Tan, Huachun

PY - 2024/7

Y1 - 2024/7

N2 - Vehicle–infrastructure cooperative perception is becoming increasingly crucial for autonomous driving systems and involves leveraging infrastructure’s broader spatial perspective and computational resources. This paper introduces CoFormerNet, which is a novel framework for improving cooperative perception. CoFormerNet employs a consistent structure for both vehicle and infrastructure branches, integrating the temporal aggregation module and spatial-modulated cross-attention to fuse intermediate features at two distinct stages. This design effectively handles communication delays and spatial misalignment. Experimental results using the DAIR-V2X and V2XSet datasets demonstrated that CoFormerNet significantly outperformed the existing methods, achieving state-of-the-art performance in 3D object detection.

AB - Vehicle–infrastructure cooperative perception is becoming increasingly crucial for autonomous driving systems and involves leveraging infrastructure’s broader spatial perspective and computational resources. This paper introduces CoFormerNet, which is a novel framework for improving cooperative perception. CoFormerNet employs a consistent structure for both vehicle and infrastructure branches, integrating the temporal aggregation module and spatial-modulated cross-attention to fuse intermediate features at two distinct stages. This design effectively handles communication delays and spatial misalignment. Experimental results using the DAIR-V2X and V2XSet datasets demonstrated that CoFormerNet significantly outperformed the existing methods, achieving state-of-the-art performance in 3D object detection.

KW - 3D LiDAR object detection

KW - V2X

KW - cooperative perception

UR - http://www.scopus.com/inward/record.url?scp=85198368418&partnerID=8YFLogxK

U2 - 10.3390/s24134101

DO - 10.3390/s24134101

M3 - Article

C2 - 39000880

AN - SCOPUS:85198368418

SN - 1424-8220

VL - 24

JO - Sensors

JF - Sensors

IS - 13

M1 - 4101

ER -

CoFormerNet: A Transformer-Based Fusion Approach for Enhanced Vehicle-Infrastructure Cooperative Perception

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this