Abstract
In today's research, unimodal target detection can no longer meet the needs of target detection in complex backgrounds as well as harsh environments. To solve the problems of the existing cross-modal image fusion and cross-modal image target detection algorithms with network-heavy parameters and redundant network design, a selectable cross-modal image fusion and target detection algorithm framework (CMIFDF) is proposed. The framework consists of a lightweight dual-branch cross-modal image fusion network (LDFnet) and a cross-modal object detection algorithm with shareable weights (CM-YOLO) to rationally utilize the cross-modal image information and improve the performance of target detection under complex backgrounds. LDFnet is a two-branch fusion module based on depth-separable convolutional and attentional mechanisms. It can quickly and fully extract feature information from visible and infrared images. In CM-YOLO, fused images or raw images (visible and infrared) are fed into a target detection network with shareable weights for training and detection. A simplified asymptotic feature pyramid network (SAFPN) is proposed, and a lightweight multilayer perceptual attention module (LMA) is designed to enhance the fusion efficiency of the fusion network, so that efficient fusion of features can be achieved with fewer model parameters and low dissipation power to improve the network detection performance. Experiments on publicly available datasets show that the algorithmic framework can make full use of the feature information of cross-modal images as inputs and can effectively improve detection performance in complex environments.
Original language | English |
---|---|
Article number | 105631 |
Journal | Infrared Physics and Technology |
Volume | 145 |
DOIs | |
Publication status | Published - Mar 2025 |
Keywords
- CM-YOLO
- Complex backgrounds
- Cross-modal images fusion
- Shareable weights