Mask-Guided Cross-Modality Fusion Network for Visible-Infrared Vehicle Detection

Lingyun Tian, Qiang Shen, Zilong Deng*, Yang Gao, Simiao Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Drone-based vehicle detection is crucial for intelligent traffic management. However, current methods relying solely on single visible or infrared modalities struggle with precision and robustness, especially in adverse weather conditions. The effective integration of cross-modal information to enhance vehicle detection still poses significant challenges. In this letter, we propose a masked-guided cross-modality fusion method, called MCMF, for robust and accurate visible-infrared vehicle detection. Firstly, we construct a framework consisting of three branches, with two dedicated to the visible and infrared modalities respectively, and another tailored for the fused multi-modal. Secondly, we introduce a Location-Sensitive Masked AutoEncoder (LMAE) for intermediate-level feature fusion. Specifically, our LMAE utilizes masks to cover intermediate-level features of one modality based on the prediction hierarchy of another modality, and then distills cross-modality guidance information through regularization constraints. This strategy, through a self-learning paradigm, effectively preserves the useful information from both modalities while eliminating redundant information from each. Finally, the fused features are input into an uncertainty-based detection head to generate predictions for bounding boxes of vehicles. When evaluated on the DroneVehicle dataset, our MCIF reaches 71.42% w.r..t. mAP, outperforming an established baseline method by 7.42%. Ablation studies further demonstrate the effectiveness of our LMAE for visible-infrared fusion.

Original languageEnglish
JournalIEEE Signal Processing Letters
DOIs
Publication statusAccepted/In press - 2025

Keywords

  • Drone-based vehicle detection
  • location-sensitive masked autoencoder
  • masked guided cross-modality fusion
  • regularization constraint

Fingerprint

Dive into the research topics of 'Mask-Guided Cross-Modality Fusion Network for Visible-Infrared Vehicle Detection'. Together they form a unique fingerprint.

Cite this