Skip to main navigation Skip to search Skip to main content

VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection

  • Beijing Institute of Technology
  • National Key Laboratory of Science and Technology on Space-Born Intelligent Information Processing

Research output: Contribution to journalArticlepeer-review

Abstract

Unsupervised domain adaptation methods have been widely explored to bridge domain gaps. However, in real-world remote-sensing scenarios, privacy and transmission constraints often preclude access to source domain data, which limits their practical applicability. Recently, source-free object detection (SFOD) has emerged as a promising alternative, aiming at cross-domain adaptation without relying on source data, primarily through a self-training paradigm. Despite its potential, SFOD frequently suffers from training collapse caused by noisy pseudolabels, especially in remote sensing imagery with dense objects and complex backgrounds. Considering that limited target domain annotations are often feasible in practice, we propose a Vision foundation model-Guided DEtection TRansformer (VG-DETR), built upon a semi-supervised framework for remote sensing object detection under source-free constraints. VG-DETR integrates a vision foundation model (VFM) into the online training pipeline in a free lunch manner, leveraging a small amount of labeled target data to mitigate pseudolabel noise while improving the detector's feature-extraction capability. Specifically, we introduce a VFM-guided pseudolabel mining strategy that leverages the VFM's semantic priors to further assess the reliability of the generated pseudolabels. By recovering potentially correct predictions from low-confidence outputs, our strategy improves pseudolabel quality and quantity. In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels. Through contrastive learning among fine-grained prototypes and similarity matching between feature maps, this dual-level alignment further enhances the robustness of feature representations against domain gaps. Extensive experiments demonstrate that VG-DETR achieves superior performance in source-free remote sensing detection tasks.

Original languageEnglish
Pages (from-to)16028-16043
Number of pages16
JournalIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Volume19
DOIs
Publication statusPublished - 2026

Keywords

  • Mean teacher detection transformer
  • remote sensing imagery
  • vision foundation model (VFM)

Fingerprint

Dive into the research topics of 'VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection'. Together they form a unique fingerprint.

Cite this