Abstract
This paper proposes a novel multimodal collaborative perception framework to enhance the situational awareness of autonomous vehicles. First, a multimodal fusion baseline system is built that effectively integrates Light Detection and Ranging (LiDAR) point clouds and camera images. This system provides a comparable benchmark for subsequent research. Second, various well-known feature fusion strategies are investigated in the context of collaborative scenarios, including channel-wise concatenation, element-wise summation, and transformer-based methods. This study aims to seamlessly integrate intermediate representations from different sensor modalities, facilitating an exhaustive assessment of their effects on model performance. Extensive experiments were conducted on a large-scale open-source simulation dataset, i.e., OPV2V. The results showed that attention-based multimodal fusion outperforms alternative solutions, delivering more precise target localization during complex traffic scenarios, thereby enhancing the safety and reliability of autonomous driving systems.
Translated title of the contribution | Collaborative Perception Method Based on Multisensor Fusion |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 87-96 |
Number of pages | 10 |
Journal | Journal of Radars |
Volume | 13 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2024 |