Abstract
Despite the recent accomplishments in joint infrared-visible imaging, the bimodal defocus blur (BDB) phenomenon received scant attention. Our analysis reveals that BDB is predominantly attributable to disparities in optical parameters between cameras, resulting in two primary challenges: incomplete single-modal information and difficulty in cross-modal information interaction. With regard to the former, the infrared modality is the primary victim, as the deblurring networks’ bias toward high-frequency results in erroneous low-frequency reconstruction (e.g., over-sharpening). In the latter case, the relative nature of the blur effect can lead to ambiguity in determining which modality’s information should be prioritized for guidance, and conflicts may arise between the clear components of the blurred image and the blurry components of the clear image. To address these issues, we propose the first de-bimodal defocus blur (DBDB) method, which consists of a low-frequency semantic hold (LSH) module with a pre-trained infrared model and a cross-modal complementary feature induction (CCFI) module driven by a max-min blur entropy loss. LSH is designed to ensure that the low-frequency information captured by the infrared modality does not contain any misleading data, while CCFI facilitates the acquisition of accurate information by means of adaptive adjustment and the loss function. The experimental results of deblurring and downstream tasks on two synthetic datasets demonstrate the superiority of our method.
Original language | English |
---|---|
Article number | 7 |
Journal | Visual Intelligence |
Volume | 3 |
Issue number | 1 |
DOIs | |
Publication status | Published - Dec 2025 |
Externally published | Yes |
Keywords
- Defocus blur
- Image reconstruction
- Multimodal learning
- Pre-training model
- RGBT