DCTNet: A Heterogeneous Dual-Branch Multi-Cascade Network for Infrared and Visible Image Fusion

Jinfu Li; Lei Liu; Hong Song; Yuqi Huang; Junjun Jiang; Jian Yang

doi:10.1109/TIM.2023.3325520

DCTNet: A Heterogeneous Dual-Branch Multi-Cascade Network for Infrared and Visible Image Fusion

Jinfu Li, Lei Liu, Hong Song^*, Yuqi Huang, Junjun Jiang, Jian Yang

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Deep learning has become a popular technique for infrared and visible image fusion (IVIF) due to its powerful feature representation abilities. Existing methods often employ the convolutional neural network (CNN), Transformer, or mixed CNN-Transformer to treat different modalities indiscriminately. However, we observe that these homogeneous fusion networks (i.e., same network for cross-modal) struggle to handle the specific characteristics of each modality, leading to fused images being contaminated with interference or lacking modality-specific information. To alleviate this issue, we propose a heterogeneous dual-branch multi-cascade fusion network based on CNN and Transformer, named DCTNet, aims independently to maintain the focus of respective modalities, such as radiation intensity of infrared images and texture details of visible images. In DCTNet, the CNN branch is specifically designed to exploit local features in visible images by utillizing stacked residual dense CNNs, while the Transformer branch is tailored to model the long-range dependencies, capturing the overall temperature distribution and thermal patterns of a scene in infrared images by cascading residual Transformers. In addition, we introduce an adaptive fusion interaction module (AFIM) that leverages attention mechanisms to adaptively highlight informative regions in the fused feature maps. The module assigns weights to these regions based on their contributions and dynamically merges features from the dual-branch at multiple levels. Experimental results on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of quantitative and qualitative evaluations. Moreover, we showcase the promising performance of DCTNet in downstream object detection and semantic segmentation applications on a widely-accepted benchmark.

Original language	English
Article number	5030914
Journal	IEEE Transactions on Instrumentation and Measurement
Volume	72
DOIs	https://doi.org/10.1109/TIM.2023.3325520
Publication status	Published - 2023

Keywords

Attention mechanism
Transformer
convolutional neural network (CNN)
heterogeneous fusion network
infrared and visible image fusion (IVIF)

Access to Document

10.1109/TIM.2023.3325520

Cite this

@article{aa65693da4e341809a5c5917c7001800,

title = "DCTNet: A Heterogeneous Dual-Branch Multi-Cascade Network for Infrared and Visible Image Fusion",

abstract = "Deep learning has become a popular technique for infrared and visible image fusion (IVIF) due to its powerful feature representation abilities. Existing methods often employ the convolutional neural network (CNN), Transformer, or mixed CNN-Transformer to treat different modalities indiscriminately. However, we observe that these homogeneous fusion networks (i.e., same network for cross-modal) struggle to handle the specific characteristics of each modality, leading to fused images being contaminated with interference or lacking modality-specific information. To alleviate this issue, we propose a heterogeneous dual-branch multi-cascade fusion network based on CNN and Transformer, named DCTNet, aims independently to maintain the focus of respective modalities, such as radiation intensity of infrared images and texture details of visible images. In DCTNet, the CNN branch is specifically designed to exploit local features in visible images by utillizing stacked residual dense CNNs, while the Transformer branch is tailored to model the long-range dependencies, capturing the overall temperature distribution and thermal patterns of a scene in infrared images by cascading residual Transformers. In addition, we introduce an adaptive fusion interaction module (AFIM) that leverages attention mechanisms to adaptively highlight informative regions in the fused feature maps. The module assigns weights to these regions based on their contributions and dynamically merges features from the dual-branch at multiple levels. Experimental results on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of quantitative and qualitative evaluations. Moreover, we showcase the promising performance of DCTNet in downstream object detection and semantic segmentation applications on a widely-accepted benchmark.",

keywords = "Attention mechanism, Transformer, convolutional neural network (CNN), heterogeneous fusion network, infrared and visible image fusion (IVIF)",

author = "Jinfu Li and Lei Liu and Hong Song and Yuqi Huang and Junjun Jiang and Jian Yang",

note = "Publisher Copyright: {\textcopyright} 1963-2012 IEEE.",

year = "2023",

doi = "10.1109/TIM.2023.3325520",

language = "English",

volume = "72",

journal = "IEEE Transactions on Instrumentation and Measurement",

issn = "0018-9456",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - DCTNet

T2 - A Heterogeneous Dual-Branch Multi-Cascade Network for Infrared and Visible Image Fusion

AU - Li, Jinfu

AU - Liu, Lei

AU - Song, Hong

AU - Huang, Yuqi

AU - Jiang, Junjun

AU - Yang, Jian

PY - 2023

Y1 - 2023

N2 - Deep learning has become a popular technique for infrared and visible image fusion (IVIF) due to its powerful feature representation abilities. Existing methods often employ the convolutional neural network (CNN), Transformer, or mixed CNN-Transformer to treat different modalities indiscriminately. However, we observe that these homogeneous fusion networks (i.e., same network for cross-modal) struggle to handle the specific characteristics of each modality, leading to fused images being contaminated with interference or lacking modality-specific information. To alleviate this issue, we propose a heterogeneous dual-branch multi-cascade fusion network based on CNN and Transformer, named DCTNet, aims independently to maintain the focus of respective modalities, such as radiation intensity of infrared images and texture details of visible images. In DCTNet, the CNN branch is specifically designed to exploit local features in visible images by utillizing stacked residual dense CNNs, while the Transformer branch is tailored to model the long-range dependencies, capturing the overall temperature distribution and thermal patterns of a scene in infrared images by cascading residual Transformers. In addition, we introduce an adaptive fusion interaction module (AFIM) that leverages attention mechanisms to adaptively highlight informative regions in the fused feature maps. The module assigns weights to these regions based on their contributions and dynamically merges features from the dual-branch at multiple levels. Experimental results on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of quantitative and qualitative evaluations. Moreover, we showcase the promising performance of DCTNet in downstream object detection and semantic segmentation applications on a widely-accepted benchmark.

AB - Deep learning has become a popular technique for infrared and visible image fusion (IVIF) due to its powerful feature representation abilities. Existing methods often employ the convolutional neural network (CNN), Transformer, or mixed CNN-Transformer to treat different modalities indiscriminately. However, we observe that these homogeneous fusion networks (i.e., same network for cross-modal) struggle to handle the specific characteristics of each modality, leading to fused images being contaminated with interference or lacking modality-specific information. To alleviate this issue, we propose a heterogeneous dual-branch multi-cascade fusion network based on CNN and Transformer, named DCTNet, aims independently to maintain the focus of respective modalities, such as radiation intensity of infrared images and texture details of visible images. In DCTNet, the CNN branch is specifically designed to exploit local features in visible images by utillizing stacked residual dense CNNs, while the Transformer branch is tailored to model the long-range dependencies, capturing the overall temperature distribution and thermal patterns of a scene in infrared images by cascading residual Transformers. In addition, we introduce an adaptive fusion interaction module (AFIM) that leverages attention mechanisms to adaptively highlight informative regions in the fused feature maps. The module assigns weights to these regions based on their contributions and dynamically merges features from the dual-branch at multiple levels. Experimental results on three public datasets demonstrate that the proposed method outperforms state-of-the-art methods in terms of quantitative and qualitative evaluations. Moreover, we showcase the promising performance of DCTNet in downstream object detection and semantic segmentation applications on a widely-accepted benchmark.

KW - Attention mechanism

KW - Transformer

KW - convolutional neural network (CNN)

KW - heterogeneous fusion network

KW - infrared and visible image fusion (IVIF)

UR - http://www.scopus.com/inward/record.url?scp=85174856413&partnerID=8YFLogxK

U2 - 10.1109/TIM.2023.3325520

DO - 10.1109/TIM.2023.3325520

M3 - Article

AN - SCOPUS:85174856413

SN - 0018-9456

VL - 72

JO - IEEE Transactions on Instrumentation and Measurement

JF - IEEE Transactions on Instrumentation and Measurement

M1 - 5030914

ER -

DCTNet: A Heterogeneous Dual-Branch Multi-Cascade Network for Infrared and Visible Image Fusion

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this