Remote Sensing Teacher: Cross-Domain Detection Transformer with Learnable Frequency-Enhanced Feature Alignment in Remote Sensing Imagery

Jianhong Han; Wenjie Yang; Yupei Wang; Liang Chen; Zhaoyi Luo

doi:10.1109/TGRS.2024.3378284

Remote Sensing Teacher: Cross-Domain Detection Transformer with Learnable Frequency-Enhanced Feature Alignment in Remote Sensing Imagery

Jianhong Han, Wenjie Yang, Yupei Wang^*, Liang Chen, Zhaoyi Luo

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Unsupervised domain adaptation (UDA) is critical for remote sensing object detection in real applications, aiming to address the significant performance degradation issue caused by the domain gap between the source and target domain. This method achieves cross-domain alignment by leveraging the unlabeled target domain data, thus avoiding the expensive annotation cost. However, existing works mainly cope with convolutional neural network (CNN)-based object detectors, which are characterized by complex adversarial learning architecture and fail to accurately align the features in remote sensing images with sparsely allocated objects and inevitable background noise. Compared to CNN-based methods, the detection transformer (DETR) largely simplifies the object detection pipeline and demonstrates the great potential of its intrinsic characteristics of global relation modeling between any pixels. On this basis, we propose the first strong DETR-based baseline, remote sensing teacher, for UDA in remote sensing object detection. Specifically, the remote sensing teacher introduces an innovative learnable frequency-enhanced feature alignment (LFA) module. Within this module, we initially transform the features into frequency space to simplify the attention solver and effectively capture domain-specific information. Subsequently, the module significantly enhances the global feature representations of sparsely allocated objects by using a lightweight attention mechanism. Following this, the module incorporates learnable filters with a gated mechanism, enabling selective alignment of features in noisy backgrounds. In addition, the remote sensing teacher employs a self-adaptive pseudo-label assigner (SPA) that can automatically adjust the class-wise confidence threshold according to the model's learning status, thereby enabling the generation of high-quality pseudo-labels in scenarios with a long-tailed distribution. Leveraging these pseudo-labels further mitigates the domain bias of the detector by establishing alignment at the label level. Extensive experimental results demonstrate the superior performance and generalization capabilities of our proposed remote sensing teacher in multiple remote sensing adaptation scenarios. The Code is released at https://github.com/h751410234/RemoteSensingTeacher.

源语言	英语
文章编号	5619814
页（从-至）	1-14
页数	14
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	62
DOI	https://doi.org/10.1109/TGRS.2024.3378284
出版状态	已出版 - 2024

访问文件

10.1109/TGRS.2024.3378284

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{7d90849e82dc4fb5a2deae30024271bb,

title = "Remote Sensing Teacher: Cross-Domain Detection Transformer with Learnable Frequency-Enhanced Feature Alignment in Remote Sensing Imagery",

abstract = "Unsupervised domain adaptation (UDA) is critical for remote sensing object detection in real applications, aiming to address the significant performance degradation issue caused by the domain gap between the source and target domain. This method achieves cross-domain alignment by leveraging the unlabeled target domain data, thus avoiding the expensive annotation cost. However, existing works mainly cope with convolutional neural network (CNN)-based object detectors, which are characterized by complex adversarial learning architecture and fail to accurately align the features in remote sensing images with sparsely allocated objects and inevitable background noise. Compared to CNN-based methods, the detection transformer (DETR) largely simplifies the object detection pipeline and demonstrates the great potential of its intrinsic characteristics of global relation modeling between any pixels. On this basis, we propose the first strong DETR-based baseline, remote sensing teacher, for UDA in remote sensing object detection. Specifically, the remote sensing teacher introduces an innovative learnable frequency-enhanced feature alignment (LFA) module. Within this module, we initially transform the features into frequency space to simplify the attention solver and effectively capture domain-specific information. Subsequently, the module significantly enhances the global feature representations of sparsely allocated objects by using a lightweight attention mechanism. Following this, the module incorporates learnable filters with a gated mechanism, enabling selective alignment of features in noisy backgrounds. In addition, the remote sensing teacher employs a self-adaptive pseudo-label assigner (SPA) that can automatically adjust the class-wise confidence threshold according to the model's learning status, thereby enabling the generation of high-quality pseudo-labels in scenarios with a long-tailed distribution. Leveraging these pseudo-labels further mitigates the domain bias of the detector by establishing alignment at the label level. Extensive experimental results demonstrate the superior performance and generalization capabilities of our proposed remote sensing teacher in multiple remote sensing adaptation scenarios. The Code is released at https://github.com/h751410234/RemoteSensingTeacher.",

keywords = "Object detection, remote sensing imagery, unsupervised domain adaptation (UDA)",

author = "Jianhong Han and Wenjie Yang and Yupei Wang and Liang Chen and Zhaoyi Luo",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2024",

doi = "10.1109/TGRS.2024.3378284",

language = "English",

volume = "62",

pages = "1--14",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Remote Sensing Teacher

T2 - Cross-Domain Detection Transformer with Learnable Frequency-Enhanced Feature Alignment in Remote Sensing Imagery

AU - Han, Jianhong

AU - Yang, Wenjie

AU - Wang, Yupei

AU - Chen, Liang

AU - Luo, Zhaoyi

PY - 2024

Y1 - 2024

N2 - Unsupervised domain adaptation (UDA) is critical for remote sensing object detection in real applications, aiming to address the significant performance degradation issue caused by the domain gap between the source and target domain. This method achieves cross-domain alignment by leveraging the unlabeled target domain data, thus avoiding the expensive annotation cost. However, existing works mainly cope with convolutional neural network (CNN)-based object detectors, which are characterized by complex adversarial learning architecture and fail to accurately align the features in remote sensing images with sparsely allocated objects and inevitable background noise. Compared to CNN-based methods, the detection transformer (DETR) largely simplifies the object detection pipeline and demonstrates the great potential of its intrinsic characteristics of global relation modeling between any pixels. On this basis, we propose the first strong DETR-based baseline, remote sensing teacher, for UDA in remote sensing object detection. Specifically, the remote sensing teacher introduces an innovative learnable frequency-enhanced feature alignment (LFA) module. Within this module, we initially transform the features into frequency space to simplify the attention solver and effectively capture domain-specific information. Subsequently, the module significantly enhances the global feature representations of sparsely allocated objects by using a lightweight attention mechanism. Following this, the module incorporates learnable filters with a gated mechanism, enabling selective alignment of features in noisy backgrounds. In addition, the remote sensing teacher employs a self-adaptive pseudo-label assigner (SPA) that can automatically adjust the class-wise confidence threshold according to the model's learning status, thereby enabling the generation of high-quality pseudo-labels in scenarios with a long-tailed distribution. Leveraging these pseudo-labels further mitigates the domain bias of the detector by establishing alignment at the label level. Extensive experimental results demonstrate the superior performance and generalization capabilities of our proposed remote sensing teacher in multiple remote sensing adaptation scenarios. The Code is released at https://github.com/h751410234/RemoteSensingTeacher.

AB - Unsupervised domain adaptation (UDA) is critical for remote sensing object detection in real applications, aiming to address the significant performance degradation issue caused by the domain gap between the source and target domain. This method achieves cross-domain alignment by leveraging the unlabeled target domain data, thus avoiding the expensive annotation cost. However, existing works mainly cope with convolutional neural network (CNN)-based object detectors, which are characterized by complex adversarial learning architecture and fail to accurately align the features in remote sensing images with sparsely allocated objects and inevitable background noise. Compared to CNN-based methods, the detection transformer (DETR) largely simplifies the object detection pipeline and demonstrates the great potential of its intrinsic characteristics of global relation modeling between any pixels. On this basis, we propose the first strong DETR-based baseline, remote sensing teacher, for UDA in remote sensing object detection. Specifically, the remote sensing teacher introduces an innovative learnable frequency-enhanced feature alignment (LFA) module. Within this module, we initially transform the features into frequency space to simplify the attention solver and effectively capture domain-specific information. Subsequently, the module significantly enhances the global feature representations of sparsely allocated objects by using a lightweight attention mechanism. Following this, the module incorporates learnable filters with a gated mechanism, enabling selective alignment of features in noisy backgrounds. In addition, the remote sensing teacher employs a self-adaptive pseudo-label assigner (SPA) that can automatically adjust the class-wise confidence threshold according to the model's learning status, thereby enabling the generation of high-quality pseudo-labels in scenarios with a long-tailed distribution. Leveraging these pseudo-labels further mitigates the domain bias of the detector by establishing alignment at the label level. Extensive experimental results demonstrate the superior performance and generalization capabilities of our proposed remote sensing teacher in multiple remote sensing adaptation scenarios. The Code is released at https://github.com/h751410234/RemoteSensingTeacher.

KW - Object detection

KW - remote sensing imagery

KW - unsupervised domain adaptation (UDA)

UR - http://www.scopus.com/inward/record.url?scp=85188538642&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2024.3378284

DO - 10.1109/TGRS.2024.3378284

M3 - Article

AN - SCOPUS:85188538642

SN - 0196-2892

VL - 62

SP - 1

EP - 14

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5619814

ER -

Remote Sensing Teacher: Cross-Domain Detection Transformer with Learnable Frequency-Enhanced Feature Alignment in Remote Sensing Imagery

摘要

访问文件

其它文件与链接

指纹

引用此