Abstract
In recent years, the remote sensing domain generalization semantic segmentation has attracted increasing attention due to significant domain shifts caused by variations in sensors, imaging conditions, and geographic regions. Vision foundation models (VFMs) possess strong general-purpose feature extraction capabilities and can be effectively transferred across diverse data, showing great potential for remote sensing domain generalization and semantic segmentation. However, existing VFM-based spatial-domain parameter-efficient fine-tuning methods struggle to handle pronounced cross-domain intraclass variations in remote sensing. To address this issue, the adaptive frequency-aware adapter (AFA-Adapter) is proposed, which adaptively selects frequency components to improve cross-domain intraclass feature consistency. Building upon this, the spatial multiprototype adapter (SMP-Adapter) is proposed, which clusters multiple prototypes for features of each land-cover category to model complex intraclass diversity. Land-cover features are then weighted by their nearest intraclass prototype, thereby enhancing the discriminability of easily confused features at feature cluster boundaries. By integrating these two modules, we propose a frequency-spatial dual domain fine-tuning network (D2FT-Net), which effectively alleviates cross-domain intraclass variations and improves the generalization capability of VFMs for remote sensing domain generalization semantic segmentation. Extensive experiments under four cross-domain settings demonstrate the effectiveness of the proposed D2FT-Net, which achieves an average mIoU improvement of 1.09% over state-of-the-art methods, with the best gain reaching 1.64%. The source code will be released at https://github.com/ssshen0315/D2FT-Net
| Original language | English |
|---|---|
| Article number | 5621016 |
| Journal | IEEE Transactions on Geoscience and Remote Sensing |
| Volume | 64 |
| DOIs | |
| Publication status | Published - 2026 |
Keywords
- Domain generalization
- fine-tuning
- remote sensing
- semantic segmentation
- vision foundation models (VFMs)
Fingerprint
Dive into the research topics of 'D2FT-Net: Frequency-Spatial Dual Domain Fine-Tuning of Vision Foundation Models for Remote Sensing Domain Generalization Semantic Segmentation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver