PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Xingyuan Li, Zirui Wang, Zhiying Jiang, Wei Zhong, Wei Fan, Bin Xu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model's understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks.

Original languageEnglish
Pages (from-to)502-515
Number of pages14
JournalIEEE/CAA Journal of Automatica Sinica
Volume12
Issue number3
DOIs
Publication statusPublished - 2025

Keywords

  • Bi-level optimization
  • image fusion
  • infrared and visible image
  • prompt learning

Fingerprint

Dive into the research topics of 'PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion'. Together they form a unique fingerprint.

Cite this