摘要
Remote sensing image–text retrieval (RSITR) is a pivotal task aimed at achieving efficient bidirectional matching between visual content and textual descriptions in large-scale remote sensing databases. Nevertheless, it faces a fundamental challenge: the severe information asymmetry between sparse, abstract captions and dense, multi-scale overhead imagery. Prior works predominantly focus on learning static cross-modal representations during training; however, this frozen inference process is fundamentally limited in bridging the asymmetry due to its inability to dynamically compensate for missing details or resolve visual ambiguities in heterogeneous scenes. To overcome this limitation, we propose CADRE (Test-Time Candidate-Aware Dual Refinement), a retrieval-backbone-agnostic framework exploiting retrieved candidates as feedback for bidirectional alignment. Operating on a novel Inject-and-Suppress paradigm, CADRE comprises two complementary modules. First, the Visual-Context Injection (VCI) module addresses textual sparsity by incorporating an adaptive filtering mechanism to efficiently mine hierarchical visual evidence from high-confidence candidates and inject it into the query via a domain-adapted Multimodal Large Language Model (MLLM). Second, the Query-Guided Disambiguation (QGD) module targets visual ambiguity by generating multi-view visual hypotheses and utilizing the query as a semantic probe to suppress background noise. Extensive experiments on three standard benchmarks (RSICD, RSITMD, and UCM) demonstrate good transferability across several strong RSITR backbones.
| 源语言 | 英语 |
|---|---|
| 文章编号 | 1389 |
| 期刊 | Remote Sensing |
| 卷 | 18 |
| 期 | 9 |
| DOI | |
| 出版状态 | 已出版 - 5月 2026 |
| 已对外发布 | 是 |
指纹
探究 'Test-Time Candidate-Aware Dual Refinement for Remote Sensing Image–Text Retrieval' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver