TY - JOUR
T1 - Infrared and Visible Image Fusion with Overlapped Window Transformer
AU - Liu, Xingwang
AU - Mersha, Bemnet Wondimagegnehu
AU - Hirota, Kaoru
AU - Dai, Yaping
N1 - Publisher Copyright:
© Fuji Technology Press Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/)
PY - 2025/7
Y1 - 2025/7
N2 - An overlap window-based transformer is proposed for infrared and visible image fusion. A multi-head selfattention mechanism based on overlapping windows is designed. By introducing overlapping regions between windows, local features can interact across different windows, avoiding the discontinuity and information isolation issues caused by non-overlapping partitions. The proposed model is trained using an unsupervised loss function composed of three terms: pixel, gradient, and structural loss. With the end-to-end model and the unsupervised loss function, our method eliminates the need to manually design complex activity-level measurements and fusion strategies. Extensive experiments on the public TNO (grayscale) and RoadScene (RGB) datasets demonstrate that the proposed method achieves the expected long-distance dependency modeling capabilities when fusing infrared and visible images, as well as the positive results in both qualitative and quantitative evaluations.
AB - An overlap window-based transformer is proposed for infrared and visible image fusion. A multi-head selfattention mechanism based on overlapping windows is designed. By introducing overlapping regions between windows, local features can interact across different windows, avoiding the discontinuity and information isolation issues caused by non-overlapping partitions. The proposed model is trained using an unsupervised loss function composed of three terms: pixel, gradient, and structural loss. With the end-to-end model and the unsupervised loss function, our method eliminates the need to manually design complex activity-level measurements and fusion strategies. Extensive experiments on the public TNO (grayscale) and RoadScene (RGB) datasets demonstrate that the proposed method achieves the expected long-distance dependency modeling capabilities when fusing infrared and visible images, as well as the positive results in both qualitative and quantitative evaluations.
KW - image fusion
KW - infrared image
KW - local self-attention
KW - transformer
UR - https://www.scopus.com/pages/publications/105023956562
U2 - 10.20965/jaciii.2025.p0838
DO - 10.20965/jaciii.2025.p0838
M3 - Article
AN - SCOPUS:105023956562
SN - 1343-0130
VL - 29
SP - 838
EP - 846
JO - Journal of Advanced Computational Intelligence and Intelligent Informatics
JF - Journal of Advanced Computational Intelligence and Intelligent Informatics
IS - 4
ER -