Abstract
Cross-view geo-localization (CVGL) aims to match ground-level images with geo-tagged aerial imagery for precise localization, but remains challenging due to severe viewpoint discrepancies, partial correspondence, and significant domain shifts across geographic regions. While existing methods achieve high accuracy within specific datasets, their generalization ability to unseen environments is limited. In this paper, we propose GenGeo, a unified framework that integrates vision foundation model representations with a matching-aware aggregation mechanism to address these challenges. Specifically, we leverage DINOv2 to extract semantically rich and transferable features, and revisit the SALAD aggregation module in the context of CVGL. By employing a shared clustering strategy, the proposed framework projects cross-view features into a unified assignment space, enabling implicit semantic alignment across views, while the dustbin mechanism effectively filters unmatched and non-informative regions arising from partial correspondence. Extensive experiments on three large-scale benchmarks (CVUSA, CVACT, and VIGOR) demonstrate that GenGeo achieves state-of-the-art performance in cross-dataset generalization and consistently improves robustness under severe domain shifts and spatial misalignment. Notably, our method outperforms the baseline by 14.65% in Top-1 Recall on the CVUSA-to-CVACT transfer task. These results highlight the effectiveness of combining foundation model representations with matching-aware aggregation, and suggest that enforcing semantic consistency in a shared assignment space is a promising direction for generalizable cross-view geo-localization.
| Original language | English |
|---|---|
| Article number | 1116 |
| Journal | Remote Sensing |
| Volume | 18 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - Apr 2026 |
| Externally published | Yes |
Keywords
- cross-view geo-localization
- feature aggregation
- generalization capacity
- remote sensing imagery
- vision foundation models
Fingerprint
Dive into the research topics of 'GenGeo: Robust Cross-View Geo-Localization via Foundation Model and Dynamic Feature Aggregation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver