DETR-ORD: An Improved DETR Detector for Oriented Remote Sensing Object Detection with Feature Reconstruction and Dynamic Query

Xiaohai He, Kaiwen Liang, Weimin Zhang*, Fangxing Li, Zhou Jiang, Zhengqing Zuo, Xinyan Tan

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Optical remote sensing images often feature high resolution, dense target distribution, and uneven target sizes, while transformer-based detectors like DETR reduce manually designed components, DETR does not support arbitrary-oriented object detection and suffers from high computational costs and slow convergence when handling large sequences of images. Additionally, bipartite graph matching and the limit on the number of queries result in transformer-based detectors performing poorly in scenarios with multiple objects and small object sizes. We propose an improved DETR detector for Oriented remote sensing object detection with Feature Reconstruction and Dynamic Query, termed DETR-ORD. It introduces rotation into the transformer architecture for oriented object detection, reduces computational cost with a hybrid encoder, and includes an IFR (image feature reconstruction) module to address the loss of positional information due to the flattening operation. It also uses ATSS to select auxiliary dynamic training queries for the decoder. This improved DETR-based detector enhances detection performance in challenging oriented optical remote sensing scenarios with similar backbone network parameters. Our approach achieves superior results on most optical remote sensing datasets, such as DOTA-v1.5 (72.07% mAP) and DIOR-R (66.60% mAP), surpassing the baseline detector.

Original languageEnglish
Article number3516
JournalRemote Sensing
Volume16
Issue number18
DOIs
Publication statusPublished - Sept 2024

Keywords

  • deep learning
  • optical remote sensing images
  • oriented object detection
  • transformer

Fingerprint

Dive into the research topics of 'DETR-ORD: An Improved DETR Detector for Oriented Remote Sensing Object Detection with Feature Reconstruction and Dynamic Query'. Together they form a unique fingerprint.

Cite this