PETNet: A YOLO-based prior enhanced transformer network for aerial image detection

Tianyu Wang; Zhongjing Ma; Tao Yang; Suli Zou

doi:10.1016/j.neucom.2023.126384

PETNet: A YOLO-based prior enhanced transformer network for aerial image detection

Tianyu Wang, Zhongjing Ma, Tao Yang, Suli Zou^*

^*Corresponding author for this work

School of Automation

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

15 Citations (Scopus)

Abstract

Unmanned aerial vehicles (UAVs) have been applied to inspect in various scenarios due to their high efficiency, low cost, and excellent mobility. However, the objects in aerial images are much smaller and denser than general objects, causing it difficult for current object detection methods to achieve the expected results. To solve this issue, a prior enhanced Transformer network (PETNet) based on YOLO is proposed in this paper. Specifically, a novel prior enhanced Transformer (PET) module and a one-to-many feature fusion (OMFF) mechanism are proposed to embed into the network. Two additional detection heads are added to the shallow feature maps. In this work, PET is used to capture enhanced global information to improve the expressive ability of the network. The OMFF aims to fuse multi-type features to minimize the information loss of small objects. In addition, the added detection heads provide more possibility of detecting smaller-scale objects, and the extended multi-head parallel detection is more suitable for the multi-scale transformation of objects in aerial images. On the VisDrone-2021 and UAVDT databases, the proposed PETNet achieves state-of-the-art results with average precision (AP) of 35.3 and 21.5, respectively, which indicates that the proposed network is more suitable for aerial image detection and is of a great reference value.

Original language	English
Article number	126384
Journal	Neurocomputing
Volume	547
DOIs	https://doi.org/10.1016/j.neucom.2023.126384
Publication status	Published - 28 Aug 2023

Keywords

Aerial image
Deep learning
Small object detection
Transformer

Access to Document

10.1016/j.neucom.2023.126384

Cite this

Wang, T., Ma, Z., Yang, T., & Zou, S. (2023). PETNet: A YOLO-based prior enhanced transformer network for aerial image detection. Neurocomputing, 547, Article 126384. https://doi.org/10.1016/j.neucom.2023.126384

@article{ad435b355f3249838e254c727bb2721b,

title = "PETNet: A YOLO-based prior enhanced transformer network for aerial image detection",

abstract = "Unmanned aerial vehicles (UAVs) have been applied to inspect in various scenarios due to their high efficiency, low cost, and excellent mobility. However, the objects in aerial images are much smaller and denser than general objects, causing it difficult for current object detection methods to achieve the expected results. To solve this issue, a prior enhanced Transformer network (PETNet) based on YOLO is proposed in this paper. Specifically, a novel prior enhanced Transformer (PET) module and a one-to-many feature fusion (OMFF) mechanism are proposed to embed into the network. Two additional detection heads are added to the shallow feature maps. In this work, PET is used to capture enhanced global information to improve the expressive ability of the network. The OMFF aims to fuse multi-type features to minimize the information loss of small objects. In addition, the added detection heads provide more possibility of detecting smaller-scale objects, and the extended multi-head parallel detection is more suitable for the multi-scale transformation of objects in aerial images. On the VisDrone-2021 and UAVDT databases, the proposed PETNet achieves state-of-the-art results with average precision (AP) of 35.3 and 21.5, respectively, which indicates that the proposed network is more suitable for aerial image detection and is of a great reference value.",

keywords = "Aerial image, Deep learning, Small object detection, Transformer",

author = "Tianyu Wang and Zhongjing Ma and Tao Yang and Suli Zou",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2023",

month = aug,

day = "28",

doi = "10.1016/j.neucom.2023.126384",

language = "English",

volume = "547",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - PETNet

T2 - A YOLO-based prior enhanced transformer network for aerial image detection

AU - Wang, Tianyu

AU - Ma, Zhongjing

AU - Yang, Tao

AU - Zou, Suli

PY - 2023/8/28

Y1 - 2023/8/28

N2 - Unmanned aerial vehicles (UAVs) have been applied to inspect in various scenarios due to their high efficiency, low cost, and excellent mobility. However, the objects in aerial images are much smaller and denser than general objects, causing it difficult for current object detection methods to achieve the expected results. To solve this issue, a prior enhanced Transformer network (PETNet) based on YOLO is proposed in this paper. Specifically, a novel prior enhanced Transformer (PET) module and a one-to-many feature fusion (OMFF) mechanism are proposed to embed into the network. Two additional detection heads are added to the shallow feature maps. In this work, PET is used to capture enhanced global information to improve the expressive ability of the network. The OMFF aims to fuse multi-type features to minimize the information loss of small objects. In addition, the added detection heads provide more possibility of detecting smaller-scale objects, and the extended multi-head parallel detection is more suitable for the multi-scale transformation of objects in aerial images. On the VisDrone-2021 and UAVDT databases, the proposed PETNet achieves state-of-the-art results with average precision (AP) of 35.3 and 21.5, respectively, which indicates that the proposed network is more suitable for aerial image detection and is of a great reference value.

AB - Unmanned aerial vehicles (UAVs) have been applied to inspect in various scenarios due to their high efficiency, low cost, and excellent mobility. However, the objects in aerial images are much smaller and denser than general objects, causing it difficult for current object detection methods to achieve the expected results. To solve this issue, a prior enhanced Transformer network (PETNet) based on YOLO is proposed in this paper. Specifically, a novel prior enhanced Transformer (PET) module and a one-to-many feature fusion (OMFF) mechanism are proposed to embed into the network. Two additional detection heads are added to the shallow feature maps. In this work, PET is used to capture enhanced global information to improve the expressive ability of the network. The OMFF aims to fuse multi-type features to minimize the information loss of small objects. In addition, the added detection heads provide more possibility of detecting smaller-scale objects, and the extended multi-head parallel detection is more suitable for the multi-scale transformation of objects in aerial images. On the VisDrone-2021 and UAVDT databases, the proposed PETNet achieves state-of-the-art results with average precision (AP) of 35.3 and 21.5, respectively, which indicates that the proposed network is more suitable for aerial image detection and is of a great reference value.

KW - Aerial image

KW - Deep learning

KW - Small object detection

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85160768222&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2023.126384

DO - 10.1016/j.neucom.2023.126384

M3 - Article

AN - SCOPUS:85160768222

SN - 0925-2312

VL - 547

JO - Neurocomputing

JF - Neurocomputing

M1 - 126384

ER -

PETNet: A YOLO-based prior enhanced transformer network for aerial image detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this