TY - JOUR
T1 - Multi-Calib
T2 - A Scalable LiDAR–Camera Calibration Network for Variable Sensor Configurations
AU - Hu, Leyun
AU - Wei, Chao
AU - Wang, Meijing
AU - Wu, Zengbin
AU - Xu, Yang
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2025/12
Y1 - 2025/12
N2 - Traditional calibration methods rely on precise targets and frequent manual intervention, making them time-consuming and unsuitable for large-scale deployment. Existing learning-based approaches, while automating the process, are typically limited to single LiDAR–camera pairs, resulting in poor scalability and high computational overhead. To address these limitations, we propose a lightweight calibration network with flexibility in the number of sensor pairs, making it capable of jointly calibrating multiple cameras and LiDARs in a single forward pass. Our method employs a frozen pre-trained Swin Transformer as a shared backbone to extract unified features from both RGB images and corresponding depth maps. Additionally, we introduce a cross-modal channel-wise attention module to enhance key feature alignment and suppress irrelevant noise. Moreover, to handle variations in viewpoint, we design a modular calibration head that independently estimates the extrinsics for each LiDAR–camera pair. Through large-scale experiments on the nuScenes dataset, we show that our model, requiring merely 78.79 M parameters, attains a mean translation error of 2.651 cm and a rotation error of (Formula presented.), achieving comparable performance to existing methods while significantly reducing the computational cost.
AB - Traditional calibration methods rely on precise targets and frequent manual intervention, making them time-consuming and unsuitable for large-scale deployment. Existing learning-based approaches, while automating the process, are typically limited to single LiDAR–camera pairs, resulting in poor scalability and high computational overhead. To address these limitations, we propose a lightweight calibration network with flexibility in the number of sensor pairs, making it capable of jointly calibrating multiple cameras and LiDARs in a single forward pass. Our method employs a frozen pre-trained Swin Transformer as a shared backbone to extract unified features from both RGB images and corresponding depth maps. Additionally, we introduce a cross-modal channel-wise attention module to enhance key feature alignment and suppress irrelevant noise. Moreover, to handle variations in viewpoint, we design a modular calibration head that independently estimates the extrinsics for each LiDAR–camera pair. Through large-scale experiments on the nuScenes dataset, we show that our model, requiring merely 78.79 M parameters, attains a mean translation error of 2.651 cm and a rotation error of (Formula presented.), achieving comparable performance to existing methods while significantly reducing the computational cost.
KW - cross-modal channel-wise attention
KW - deep learning
KW - multi-LiDAR–camera calibration
UR - https://www.scopus.com/pages/publications/105024636474
U2 - 10.3390/s25237321
DO - 10.3390/s25237321
M3 - Article
C2 - 41374696
AN - SCOPUS:105024636474
SN - 1424-8220
VL - 25
JO - Sensors
JF - Sensors
IS - 23
M1 - 7321
ER -