TY - GEN
T1 - RSTS-YOLOv5
T2 - 11th China Conference on Command and Control, C2 2023
AU - Liu, Juan Xiu
AU - Li, Jiachen
AU - Hao, Ruqian
AU - Yang, Yanlong
AU - Zhang, Jing Ming
AU - Wang, Xiangzhou
AU - Lu, Guoming
AU - Zhang, Ping
AU - zhang, Jing
AU - Liu, Yong
AU - Liu, Lin
AU - Wang, Xingguo
AU - Deng, Hao
AU - Wang, Dongdong
AU - Du, Xiaohui
N1 - Publisher Copyright:
© Chinese Institute of Command and Control 2024.
PY - 2024
Y1 - 2024
N2 - Despite the tremendous progress in object detection in recent years, object detection on drone-captured images is still a great challenge because of the large number of small objects that appear densely and obscure each other in drone-captured images. In order to solve the problem of difficult object detection on drone-captured images, we propose a robust and efficient deep learning network RSTS-YOLOv5. We constructed Res swin transformer stage (RSTS) based on Swin-Transformer stage to extract global and contextual information and embedded it in YOLOv5x to explore the position of the transformer-based structure added in the detection network. In addition, we propose a multi-scale data augmentation for object detection on drone-captured images, which can enhance the robustness of the model for different scale objects without introducing additional computations. Experimental results show that our proposed RSTS-YOLOv5 achieves a mAP of 34.72% on the VisDrone test-dev subset and 34.84% on the validation-dev subset. Specifically, RSTS-YOLOv5 generalizes well on various drone-captured scenes, and is extremely competitive in object detection tasks on drone-captured images.
AB - Despite the tremendous progress in object detection in recent years, object detection on drone-captured images is still a great challenge because of the large number of small objects that appear densely and obscure each other in drone-captured images. In order to solve the problem of difficult object detection on drone-captured images, we propose a robust and efficient deep learning network RSTS-YOLOv5. We constructed Res swin transformer stage (RSTS) based on Swin-Transformer stage to extract global and contextual information and embedded it in YOLOv5x to explore the position of the transformer-based structure added in the detection network. In addition, we propose a multi-scale data augmentation for object detection on drone-captured images, which can enhance the robustness of the model for different scale objects without introducing additional computations. Experimental results show that our proposed RSTS-YOLOv5 achieves a mAP of 34.72% on the VisDrone test-dev subset and 34.84% on the validation-dev subset. Specifically, RSTS-YOLOv5 generalizes well on various drone-captured scenes, and is extremely competitive in object detection tasks on drone-captured images.
KW - Deep learning
KW - Drone-captured images
KW - Multi-scale data augmentation
KW - Tiny object detection
UR - http://www.scopus.com/inward/record.url?scp=85185726048&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-9021-4_35
DO - 10.1007/978-981-99-9021-4_35
M3 - Conference contribution
AN - SCOPUS:85185726048
SN - 9789819990207
T3 - Lecture Notes in Electrical Engineering
SP - 355
EP - 366
BT - Proceedings of 2023 11th China Conference on Command and Control
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 24 October 2023 through 25 October 2023
ER -