TY - JOUR
T1 - Wave-Cross
T2 - Balancing Thermal Saliency and Visual Detail in Infrared–Visible Image Fusion
AU - Zhou, Zhiguo
AU - Gu, Jiahao
AU - Li, Shuya
AU - Shi, Yonggang
AU - Zhou, Xuehua
N1 - Publisher Copyright:
© 2026 by the authors.
PY - 2026/1
Y1 - 2026/1
N2 - Infrared and visible image fusion (IVIF) integrates the thermal saliency of infrared images (IRs) with the structural details of visible images (VIs) to produce comprehensive scene representations. Existing methods often overemphasize one modality, leading to loss of temperature readability or visual details. To address this, we propose Wave-Cross, a wavelet-based fusion framework. Using the discrete wavelet transform (DWT), IR low-frequency sub-bands encode thermal distribution, while VI high-frequency sub-bands capture textural details. Cross-attention adaptively recombines these sub-bands, suppressing modality-specific noise and balancing complementary features. Additionally, we introduce a Heat-Consistency Loss, which enforces pixel-wise thermal ordering and local energy preservation in a self-supervised manner, ensuring the fused image retains IR interpretability while enhancing VI sharpness. Experiments on the TNO, MSRS, and M3FD datasets demonstrate the effectiveness of the proposed method. Compared with state-of-the-art baselines, Wave-Cross achieves superior performance on objective metrics such as SD, AG, SCD, SF, CC, EN, NABF, and MS-SSIM yielding clearer details and more stable thermal saliency under challenging interference conditions. These results highlight the framework’s potential for practical applications in surveillance, autonomous driving, and fault diagnosis.
AB - Infrared and visible image fusion (IVIF) integrates the thermal saliency of infrared images (IRs) with the structural details of visible images (VIs) to produce comprehensive scene representations. Existing methods often overemphasize one modality, leading to loss of temperature readability or visual details. To address this, we propose Wave-Cross, a wavelet-based fusion framework. Using the discrete wavelet transform (DWT), IR low-frequency sub-bands encode thermal distribution, while VI high-frequency sub-bands capture textural details. Cross-attention adaptively recombines these sub-bands, suppressing modality-specific noise and balancing complementary features. Additionally, we introduce a Heat-Consistency Loss, which enforces pixel-wise thermal ordering and local energy preservation in a self-supervised manner, ensuring the fused image retains IR interpretability while enhancing VI sharpness. Experiments on the TNO, MSRS, and M3FD datasets demonstrate the effectiveness of the proposed method. Compared with state-of-the-art baselines, Wave-Cross achieves superior performance on objective metrics such as SD, AG, SCD, SF, CC, EN, NABF, and MS-SSIM yielding clearer details and more stable thermal saliency under challenging interference conditions. These results highlight the framework’s potential for practical applications in surveillance, autonomous driving, and fault diagnosis.
KW - cross-attention
KW - heat-consistency
KW - image fusion
KW - infrared and visible images
KW - wavelet transform
UR - https://www.scopus.com/pages/publications/105028562333
U2 - 10.3390/electronics15020321
DO - 10.3390/electronics15020321
M3 - Article
AN - SCOPUS:105028562333
SN - 2079-9292
VL - 15
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 2
M1 - 321
ER -