TY - JOUR
T1 - RADCI
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
AU - Yu, Heng
AU - Zhang, Ruiheng
AU - Sun, Haoyang
AU - Cao, Zhe
AU - Yang, Biwen
AU - Zhang, Jin
AU - Liu, Guanyu
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - High-quality perception is crucial in autonomous driving and monitoring systems, where millimeter-wave radar and infrared cameras play important roles due to their robustness and reliability under harsh conditions. Both technologies can serve as low-cost supplements to optical image detection, improving overall system robustness. However, there is currently a lack of widely applicable feature-level fusion methods and multimodal datasets to effectively integrate visible light with these two heterogeneous data types for multiple tasks. In this work, we collect a new multimodal dataset, RADCI8, which synchronizes data from a camera, an infrared camera, and a radar for target detection and tracking. The dataset includes 2D image annotations, radar RAD tensor data with distance, angle, and Doppler information, as well as target ID annotations in both data formats. In addition, to address the incomplete use of radar data in previous fusion algorithms, we propose a detection method that fuses image and radar features using feature concatenation and an attention mechanism. Our proposed algorithm achieves 51.5% AP with an IOU of 50:95 on 2D bounding box prediction, significantly improving average detection accuracy over vision-based methods and maintaining robustness even when a single sensor degrades.
AB - High-quality perception is crucial in autonomous driving and monitoring systems, where millimeter-wave radar and infrared cameras play important roles due to their robustness and reliability under harsh conditions. Both technologies can serve as low-cost supplements to optical image detection, improving overall system robustness. However, there is currently a lack of widely applicable feature-level fusion methods and multimodal datasets to effectively integrate visible light with these two heterogeneous data types for multiple tasks. In this work, we collect a new multimodal dataset, RADCI8, which synchronizes data from a camera, an infrared camera, and a radar for target detection and tracking. The dataset includes 2D image annotations, radar RAD tensor data with distance, angle, and Doppler information, as well as target ID annotations in both data formats. In addition, to address the incomplete use of radar data in previous fusion algorithms, we propose a detection method that fuses image and radar features using feature concatenation and an attention mechanism. Our proposed algorithm achieves 51.5% AP with an IOU of 50:95 on 2D bounding box prediction, significantly improving average detection accuracy over vision-based methods and maintaining robustness even when a single sensor degrades.
KW - Dataset
KW - Multimodal
KW - Radar Processing
KW - Sensor Fusion
UR - https://www.scopus.com/pages/publications/105009586674
U2 - 10.1109/ICASSP49660.2025.10890097
DO - 10.1109/ICASSP49660.2025.10890097
M3 - Conference article
AN - SCOPUS:105009586674
SN - 0736-7791
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
Y2 - 6 April 2025 through 11 April 2025
ER -