CMIFDF: A lightweight cross-modal image fusion and weight-sharing object detection network framework

Chunbo Zhao, Bo Mo*, Jie Zhao, Yimeng Tao, Donghui Zhao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In today's research, unimodal target detection can no longer meet the needs of target detection in complex backgrounds as well as harsh environments. To solve the problems of the existing cross-modal image fusion and cross-modal image target detection algorithms with network-heavy parameters and redundant network design, a selectable cross-modal image fusion and target detection algorithm framework (CMIFDF) is proposed. The framework consists of a lightweight dual-branch cross-modal image fusion network (LDFnet) and a cross-modal object detection algorithm with shareable weights (CM-YOLO) to rationally utilize the cross-modal image information and improve the performance of target detection under complex backgrounds. LDFnet is a two-branch fusion module based on depth-separable convolutional and attentional mechanisms. It can quickly and fully extract feature information from visible and infrared images. In CM-YOLO, fused images or raw images (visible and infrared) are fed into a target detection network with shareable weights for training and detection. A simplified asymptotic feature pyramid network (SAFPN) is proposed, and a lightweight multilayer perceptual attention module (LMA) is designed to enhance the fusion efficiency of the fusion network, so that efficient fusion of features can be achieved with fewer model parameters and low dissipation power to improve the network detection performance. Experiments on publicly available datasets show that the algorithmic framework can make full use of the feature information of cross-modal images as inputs and can effectively improve detection performance in complex environments.

Original languageEnglish
Article number105631
JournalInfrared Physics and Technology
Volume145
DOIs
Publication statusPublished - Mar 2025

Keywords

  • CM-YOLO
  • Complex backgrounds
  • Cross-modal images fusion
  • Shareable weights

Fingerprint

Dive into the research topics of 'CMIFDF: A lightweight cross-modal image fusion and weight-sharing object detection network framework'. Together they form a unique fingerprint.

Cite this