Skip to main navigation Skip to search Skip to main content

GenGeo: Robust Cross-View Geo-Localization via Foundation Model and Dynamic Feature Aggregation

  • Rong Wang
  • , Wen Yuan
  • , Wu Yuan*
  • , Tong Liu
  • , Xiao Xi
  • , Yaokai Zhu
  • *Corresponding author for this work
  • CAS - Institute of Geographical Sciences and Natural Resources Research
  • University of Chinese Academy of Sciences
  • Beijing Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Cross-view geo-localization (CVGL) aims to match ground-level images with geo-tagged aerial imagery for precise localization, but remains challenging due to severe viewpoint discrepancies, partial correspondence, and significant domain shifts across geographic regions. While existing methods achieve high accuracy within specific datasets, their generalization ability to unseen environments is limited. In this paper, we propose GenGeo, a unified framework that integrates vision foundation model representations with a matching-aware aggregation mechanism to address these challenges. Specifically, we leverage DINOv2 to extract semantically rich and transferable features, and revisit the SALAD aggregation module in the context of CVGL. By employing a shared clustering strategy, the proposed framework projects cross-view features into a unified assignment space, enabling implicit semantic alignment across views, while the dustbin mechanism effectively filters unmatched and non-informative regions arising from partial correspondence. Extensive experiments on three large-scale benchmarks (CVUSA, CVACT, and VIGOR) demonstrate that GenGeo achieves state-of-the-art performance in cross-dataset generalization and consistently improves robustness under severe domain shifts and spatial misalignment. Notably, our method outperforms the baseline by 14.65% in Top-1 Recall on the CVUSA-to-CVACT transfer task. These results highlight the effectiveness of combining foundation model representations with matching-aware aggregation, and suggest that enforcing semantic consistency in a shared assignment space is a promising direction for generalizable cross-view geo-localization.

Original languageEnglish
Article number1116
JournalRemote Sensing
Volume18
Issue number8
DOIs
Publication statusPublished - Apr 2026
Externally publishedYes

Keywords

  • cross-view geo-localization
  • feature aggregation
  • generalization capacity
  • remote sensing imagery
  • vision foundation models

Fingerprint

Dive into the research topics of 'GenGeo: Robust Cross-View Geo-Localization via Foundation Model and Dynamic Feature Aggregation'. Together they form a unique fingerprint.

Cite this