DGIM: Cascaded Dynamic Data Generation for Robust Cross-Modal Image Matching

  • Desheng Weng
  • , Wei Li*
  • , Chenzhong Gao
  • , Xiang Gen Xia
  • , Zhicheng Shi
  • , Bolun Cui
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

To address the challenges posed by extensive modality differences in image matching tasks, this article proposes a cascaded learning framework. It guides the optimization of an end-to-end matching model via a dynamic data engine, which can provide sufficient cross-modal training data to support the model's full adaptation to cross-modal features. The data engine integrates a random homography transformation module and a lightweight image generation model, enabling the online synthesis of cross-modal image pairs with geometric variations and diverse styles. This provides the matching model with rich cross-modal stimulation. The matching model adopts a hybrid architecture combining a convolutional neural network (CNN) backbone and Transformer attention mechanisms, which integrates multiscale local feature extraction with global context modeling. By adopting the proposed stepwise aggregation strategy, the efficiency of feature extraction is well ensured. Subsequently, a coarse-to-fine matching strategy is employed to achieve high accuracy and robustness of feature alignment. Comprehensive experiments on both self-collected and public cross-modal image matching datasets demonstrate that the proposed data generation for image matching (DGIM) outperforms existing state-of-the-art approaches in cross-modal matching performance while achieving a good balance between efficiency and effectiveness. It also exhibits broad practical potential across multiple fields and scenes. This work provides novel solutions and evaluation benchmarks for cross-modal image matching tasks. The code and testing dataset will be made publicly available at https://github.com/LotrL/DGIM.

Original languageEnglish
Article number4708616
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
Publication statusPublished - 2025
Externally publishedYes

Keywords

  • Cross-modal
  • data augmentation
  • feature aggregation
  • generative model
  • image matching

Fingerprint

Dive into the research topics of 'DGIM: Cascaded Dynamic Data Generation for Robust Cross-Modal Image Matching'. Together they form a unique fingerprint.

Cite this