Unsupervised Optical-Sensor Extrinsic Calibration via Dual-Transformer Alignment

  • Yuhao Wang
  • , Yong Zuo*
  • , Yi Tang*
  • , Xiaobin Hong
  • , Jian Wu
  • , Ziyu Bian
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate extrinsic calibration between optical sensors, such as camera and LiDAR, is crucial for multimodal perception. Traditional methods based on specific calibration targets exhibit poor robustness in complex optical environments such as glare, reflections, or low light, and they rely on cumbersome manual operations. To address this, we propose a fully unsupervised, end-to-end calibration framework. Our approach adopts a dual-Transformer architecture: a Vision Transformer extracts semantic features from the image stream, while a Point Transformer captures the geometric structure of the 3D LiDAR point cloud. These cross-modal representations are aligned and fused through a neural network, and a regression algorithm is used to obtain the 6-DoF extrinsic transformation matrix. A multi-constraint loss function is designed to enhance structural consistency between modalities, thereby improving calibration stability and accuracy. On the KITTI benchmark, our method achieves a mean rotation error of 0.21° and a translation error of 3.31 cm; on a self-collected dataset, it attains an average reprojection error of 1.52 pixels. These results demonstrate a generalizable and robust solution for optical-sensor extrinsic calibration, enabling precise and self-sufficient perception in real-world applications.

Original languageEnglish
Article number6944
JournalSensors
Volume25
Issue number22
DOIs
Publication statusPublished - Nov 2025
Externally publishedYes

Keywords

  • LiDAR–camera calibration
  • extrinsic parameters
  • sensor fusion
  • unsupervised

Fingerprint

Dive into the research topics of 'Unsupervised Optical-Sensor Extrinsic Calibration via Dual-Transformer Alignment'. Together they form a unique fingerprint.

Cite this