TY - JOUR
T1 - VFM-Guided Semi-Supervised Detection Transformer under Source-Free Constraints for Remote Sensing Object Detection
AU - Han, Jianhong
AU - Wang, Yupei
AU - Chen, Liang
AU - Zhang, Yuan
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Unsupervised domain adaptation methods have been widely explored to bridge domain gaps. However, in real-world remote-sensing scenarios, privacy and transmission constraints often preclude access to source domain data, which limits their practical applicability. Recently, source-free object detection (SFOD) has emerged as a promising alternative, aiming at cross-domain adaptation without relying on source data, primarily through a self-training paradigm. Despite its potential, SFOD frequently suffers from training collapse caused by noisy pseudolabels, especially in remote sensing imagery with dense objects and complex backgrounds. Considering that limited target domain annotations are often feasible in practice, we propose a Vision foundation model-Guided DEtection TRansformer (VG-DETR), built upon a semi-supervised framework for remote sensing object detection under source-free constraints. VG-DETR integrates a vision foundation model (VFM) into the online training pipeline in a free lunch manner, leveraging a small amount of labeled target data to mitigate pseudolabel noise while improving the detector's feature-extraction capability. Specifically, we introduce a VFM-guided pseudolabel mining strategy that leverages the VFM's semantic priors to further assess the reliability of the generated pseudolabels. By recovering potentially correct predictions from low-confidence outputs, our strategy improves pseudolabel quality and quantity. In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels. Through contrastive learning among fine-grained prototypes and similarity matching between feature maps, this dual-level alignment further enhances the robustness of feature representations against domain gaps. Extensive experiments demonstrate that VG-DETR achieves superior performance in source-free remote sensing detection tasks.
AB - Unsupervised domain adaptation methods have been widely explored to bridge domain gaps. However, in real-world remote-sensing scenarios, privacy and transmission constraints often preclude access to source domain data, which limits their practical applicability. Recently, source-free object detection (SFOD) has emerged as a promising alternative, aiming at cross-domain adaptation without relying on source data, primarily through a self-training paradigm. Despite its potential, SFOD frequently suffers from training collapse caused by noisy pseudolabels, especially in remote sensing imagery with dense objects and complex backgrounds. Considering that limited target domain annotations are often feasible in practice, we propose a Vision foundation model-Guided DEtection TRansformer (VG-DETR), built upon a semi-supervised framework for remote sensing object detection under source-free constraints. VG-DETR integrates a vision foundation model (VFM) into the online training pipeline in a free lunch manner, leveraging a small amount of labeled target data to mitigate pseudolabel noise while improving the detector's feature-extraction capability. Specifically, we introduce a VFM-guided pseudolabel mining strategy that leverages the VFM's semantic priors to further assess the reliability of the generated pseudolabels. By recovering potentially correct predictions from low-confidence outputs, our strategy improves pseudolabel quality and quantity. In addition, a dual-level VFM-guided alignment method is proposed, which aligns detector features with VFM embeddings at both the instance and image levels. Through contrastive learning among fine-grained prototypes and similarity matching between feature maps, this dual-level alignment further enhances the robustness of feature representations against domain gaps. Extensive experiments demonstrate that VG-DETR achieves superior performance in source-free remote sensing detection tasks.
KW - Mean teacher detection transformer
KW - remote sensing imagery
KW - vision foundation model (VFM)
UR - https://www.scopus.com/pages/publications/105037805995
U2 - 10.1109/JSTARS.2026.3689075
DO - 10.1109/JSTARS.2026.3689075
M3 - Article
AN - SCOPUS:105037805995
SN - 1939-1404
VL - 19
SP - 16028
EP - 16043
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -