TY - JOUR
T1 - NRT-YOLO
T2 - Improved YOLOv5 Based on Nested Residual Transformer for Tiny Remote Sensing Object Detection
AU - Liu, Yukuan
AU - He, Guanglin
AU - Wang, Zehu
AU - Li, Weizhe
AU - Huang, Hongfei
N1 - Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - To address the problems of tiny objects and high resolution of object detection in remote sensing imagery, the methods with coarse-grained image cropping have been widely studied. However, these methods are always inefficient and complex due to the two-stage architecture and the huge computation for split images. For these reasons, this article employs YOLO and presents an improved architecture, NRT-YOLO. Specifically, the improvements can be summarized as: extra prediction head and related feature fusion layers; novel nested residual Transformer module, C3NRT; nested residual attention module, C3NRA; and multi-scale testing. The C3NRT module presented in this paper could boost accuracy and reduce complexity of the network at the same time. Moreover, the effectiveness of the proposed method is demonstrated by three kinds of experiments. NRT-YOLO achieves 56.9% mAP0.5 with only 38.1 M parameters in the DOTA dataset, exceeding YOLOv5l by 4.5%. Also, the results of different classifications show its excellent ability to detect small sample objects. As for the C3NRT module, the ablation study and comparison experiment verified that it has the largest contribution to accuracy increment (2.7% in mAP0.5) among the improvements. In conclusion, NRT-YOLO has excellent performance in accuracy improvement and parameter reduction, which is suitable for tiny remote sensing object detection.
AB - To address the problems of tiny objects and high resolution of object detection in remote sensing imagery, the methods with coarse-grained image cropping have been widely studied. However, these methods are always inefficient and complex due to the two-stage architecture and the huge computation for split images. For these reasons, this article employs YOLO and presents an improved architecture, NRT-YOLO. Specifically, the improvements can be summarized as: extra prediction head and related feature fusion layers; novel nested residual Transformer module, C3NRT; nested residual attention module, C3NRA; and multi-scale testing. The C3NRT module presented in this paper could boost accuracy and reduce complexity of the network at the same time. Moreover, the effectiveness of the proposed method is demonstrated by three kinds of experiments. NRT-YOLO achieves 56.9% mAP0.5 with only 38.1 M parameters in the DOTA dataset, exceeding YOLOv5l by 4.5%. Also, the results of different classifications show its excellent ability to detect small sample objects. As for the C3NRT module, the ablation study and comparison experiment verified that it has the largest contribution to accuracy increment (2.7% in mAP0.5) among the improvements. In conclusion, NRT-YOLO has excellent performance in accuracy improvement and parameter reduction, which is suitable for tiny remote sensing object detection.
KW - YOLOv5
KW - nested residual transformer
KW - remote sensing imagery
KW - tiny object detection
UR - http://www.scopus.com/inward/record.url?scp=85133137102&partnerID=8YFLogxK
U2 - 10.3390/s22134953
DO - 10.3390/s22134953
M3 - Article
C2 - 35808445
AN - SCOPUS:85133137102
SN - 1424-8220
VL - 22
JO - Sensors
JF - Sensors
IS - 13
M1 - 4953
ER -