TY - GEN
T1 - V2I-BEVF
T2 - 26th IEEE International Conference on Intelligent Transportation Systems, ITSC 2023
AU - Xiang, Chao
AU - Xie, Xiaopo
AU - Feng, Chen
AU - Bai, Zhen
AU - Niu, Zhendong
AU - Yang, Mingchuan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - As one of the core modules of autonomous driving technology, environment perception has gradually become a hot research topic in industry and academia in recent years. However, self-driving vehicles face safety challenges due to the existence of perceptual blind spots and the lack of remote sensing capability. In this paper, a multi-modal fusion based on BEV representation for Vehicle-Infrastructure perception is proposed, referred to as V2I-BEVF, which mainly contains two branch networks for feature extraction from 2D images and 3D point clouds and transform them into BEV features, then use Deformable Attention Transformer to fuse and decode them in order to achieve high-precision real-time perception of road traffic participants. The V2I-BEVF algorithm proposed in this paper experimentally verified on the open-source roadside DAIR-V2X-I dataset from Tsinghua University and Baidu. The experimental results show that compared to several algorithm benchmarks provided by the DAIR-V2X-I dataset, the V2I-BEVF algorithm has a large improvement in pedestrian detection accuracy. Simultaneously, we verified the effectiveness of the proposed method on our collected dataset of roadside sensor devices. The V2I-BEVF algorithm can be combined with 5G/V2X communication technology and applied to V2I collaborative perception scenarios to take full advantage of wide roadside environmental perception vision and the small blind area.
AB - As one of the core modules of autonomous driving technology, environment perception has gradually become a hot research topic in industry and academia in recent years. However, self-driving vehicles face safety challenges due to the existence of perceptual blind spots and the lack of remote sensing capability. In this paper, a multi-modal fusion based on BEV representation for Vehicle-Infrastructure perception is proposed, referred to as V2I-BEVF, which mainly contains two branch networks for feature extraction from 2D images and 3D point clouds and transform them into BEV features, then use Deformable Attention Transformer to fuse and decode them in order to achieve high-precision real-time perception of road traffic participants. The V2I-BEVF algorithm proposed in this paper experimentally verified on the open-source roadside DAIR-V2X-I dataset from Tsinghua University and Baidu. The experimental results show that compared to several algorithm benchmarks provided by the DAIR-V2X-I dataset, the V2I-BEVF algorithm has a large improvement in pedestrian detection accuracy. Simultaneously, we verified the effectiveness of the proposed method on our collected dataset of roadside sensor devices. The V2I-BEVF algorithm can be combined with 5G/V2X communication technology and applied to V2I collaborative perception scenarios to take full advantage of wide roadside environmental perception vision and the small blind area.
UR - http://www.scopus.com/inward/record.url?scp=85186519837&partnerID=8YFLogxK
U2 - 10.1109/ITSC57777.2023.10421963
DO - 10.1109/ITSC57777.2023.10421963
M3 - Conference contribution
AN - SCOPUS:85186519837
T3 - IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC
SP - 5292
EP - 5299
BT - 2023 IEEE 26th International Conference on Intelligent Transportation Systems, ITSC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 24 September 2023 through 28 September 2023
ER -