跳到主要导航 跳到搜索 跳到主要内容

Test-Time Candidate-Aware Dual Refinement for Remote Sensing Image–Text Retrieval

  • Bofan Zhang
  • , Hao Wu*
  • *此作品的通讯作者
  • Beijing Institute of Technology

科研成果: 期刊稿件文章同行评审

摘要

Remote sensing image–text retrieval (RSITR) is a pivotal task aimed at achieving efficient bidirectional matching between visual content and textual descriptions in large-scale remote sensing databases. Nevertheless, it faces a fundamental challenge: the severe information asymmetry between sparse, abstract captions and dense, multi-scale overhead imagery. Prior works predominantly focus on learning static cross-modal representations during training; however, this frozen inference process is fundamentally limited in bridging the asymmetry due to its inability to dynamically compensate for missing details or resolve visual ambiguities in heterogeneous scenes. To overcome this limitation, we propose CADRE (Test-Time Candidate-Aware Dual Refinement), a retrieval-backbone-agnostic framework exploiting retrieved candidates as feedback for bidirectional alignment. Operating on a novel Inject-and-Suppress paradigm, CADRE comprises two complementary modules. First, the Visual-Context Injection (VCI) module addresses textual sparsity by incorporating an adaptive filtering mechanism to efficiently mine hierarchical visual evidence from high-confidence candidates and inject it into the query via a domain-adapted Multimodal Large Language Model (MLLM). Second, the Query-Guided Disambiguation (QGD) module targets visual ambiguity by generating multi-view visual hypotheses and utilizing the query as a semantic probe to suppress background noise. Extensive experiments on three standard benchmarks (RSICD, RSITMD, and UCM) demonstrate good transferability across several strong RSITR backbones.

源语言英语
文章编号1389
期刊Remote Sensing
18
9
DOI
出版状态已出版 - 5月 2026
已对外发布

指纹

探究 'Test-Time Candidate-Aware Dual Refinement for Remote Sensing Image–Text Retrieval' 的科研主题。它们共同构成独一无二的指纹。

引用此