TY - JOUR
T1 - DTSSNet
T2 - Dynamic Training Sample Selection Network for UAV Object Detection
AU - Chen, Li
AU - Liu, Chaoyang
AU - Li, Wei
AU - Xu, Qizhi
AU - Deng, Hongbin
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Object detectors often struggle with accuracy and generalization when applied to aerial imagery, primarily due to the following challenges: 1) great scale variation of objects in aerial images: both extremely small and large objects are visible in the same image; and 2) an extreme imbalance of the training sample between positive and negative anchors: there are several positive ground truth (GT) anchors and an abundance of negative anchors. In this article, we propose a dynamic training sample selection network (DTSSNet) to solve the above-mentioned problems in two dimensions. An attention-enhanced feature module (AEFM) is proposed to enhance the basic features by focusing on both channel and semantic information related to targets. This module provides more valuable information for accurately classifying objects of different scales. To tackle the imbalance in training samples, this article implements a dynamic training sample selection (DTSS) module that divides the training samples based on GT information. This module dynamically selects samples, ensuring a more balanced representation of positive and negative anchors, leading to improved learning. Importantly, the combination of AEFM and DTSS does not introduce any additional computational costs. Experimental evaluations on the VisDrone2019-DET dataset demonstrate that DTSSNet outperforms base detectors and generic approaches. Furthermore, the effectiveness of DTSSNet is validated on the UAVDT benchmark dataset, where it achieves state-of-the-art performance.
AB - Object detectors often struggle with accuracy and generalization when applied to aerial imagery, primarily due to the following challenges: 1) great scale variation of objects in aerial images: both extremely small and large objects are visible in the same image; and 2) an extreme imbalance of the training sample between positive and negative anchors: there are several positive ground truth (GT) anchors and an abundance of negative anchors. In this article, we propose a dynamic training sample selection network (DTSSNet) to solve the above-mentioned problems in two dimensions. An attention-enhanced feature module (AEFM) is proposed to enhance the basic features by focusing on both channel and semantic information related to targets. This module provides more valuable information for accurately classifying objects of different scales. To tackle the imbalance in training samples, this article implements a dynamic training sample selection (DTSS) module that divides the training samples based on GT information. This module dynamically selects samples, ensuring a more balanced representation of positive and negative anchors, leading to improved learning. Importantly, the combination of AEFM and DTSS does not introduce any additional computational costs. Experimental evaluations on the VisDrone2019-DET dataset demonstrate that DTSSNet outperforms base detectors and generic approaches. Furthermore, the effectiveness of DTSSNet is validated on the UAVDT benchmark dataset, where it achieves state-of-the-art performance.
KW - Attention enhanced feature
KW - dynamic training sample selection (DTSS)
KW - object detection
KW - unmanned aerial vehicle (UAV) aerial imagery
UR - http://www.scopus.com/inward/record.url?scp=85181556132&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3348555
DO - 10.1109/TGRS.2023.3348555
M3 - Article
AN - SCOPUS:85181556132
SN - 0196-2892
VL - 62
SP - 1
EP - 16
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5902516
ER -