TY - JOUR
T1 - RMT-YOLOv9s
T2 - An Infrared Small Target Detection Method Based on UAV Remote Sensing Images
AU - Xu, Keyu
AU - Song, Chengtian
AU - Xie, Yue
AU - Pan, Lizhi
AU - Gan, Xiaozheng
AU - Huang, Gao
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Unmanned aerial vehicles (UAVs) and infrared imaging technology have numerous applications in civilian fields. To address the issues of low accuracy resulting from complex ground backgrounds, small target size, and limited target features in UAV remote sensing infrared image target detection, we use the YOLOv9s model and the latest retentive networks meet vision transformers (RMTs) technology and propose the RMT-YOLOv9s model for infrared small target detection. First, a convolutional neural network (CNN)-RMT-based backbone is proposed by incorporating the RMT model into the backbone network of YOLOv9s, which extracts both local and global features for small target detection. Then, an improved neck multiscale feature-fusion network RMTELAN-PANet is designed using the novel convolutional RMTELAN module proposed in this letter, which can better capture and use semantic information from feature maps. Finally, efficient multiscale attention (EMA) attention module and upsampling Dysample module are integrated into RMTELAN-PANet to further improve the feature information of small targets. Experiments on the HIT-UAV dataset show that RMT-YOLOv9s outperforms other popular methods in infrared small target detection.
AB - Unmanned aerial vehicles (UAVs) and infrared imaging technology have numerous applications in civilian fields. To address the issues of low accuracy resulting from complex ground backgrounds, small target size, and limited target features in UAV remote sensing infrared image target detection, we use the YOLOv9s model and the latest retentive networks meet vision transformers (RMTs) technology and propose the RMT-YOLOv9s model for infrared small target detection. First, a convolutional neural network (CNN)-RMT-based backbone is proposed by incorporating the RMT model into the backbone network of YOLOv9s, which extracts both local and global features for small target detection. Then, an improved neck multiscale feature-fusion network RMTELAN-PANet is designed using the novel convolutional RMTELAN module proposed in this letter, which can better capture and use semantic information from feature maps. Finally, efficient multiscale attention (EMA) attention module and upsampling Dysample module are integrated into RMTELAN-PANet to further improve the feature information of small targets. Experiments on the HIT-UAV dataset show that RMT-YOLOv9s outperforms other popular methods in infrared small target detection.
KW - Dysample
KW - YOLOv9
KW - efficient multiscale attention (EMA)
KW - retentive networks meet vision transformer (RMT) transformer
KW - unmanned aerial vehicle (UAV) infrared target detection
UR - http://www.scopus.com/inward/record.url?scp=85207444213&partnerID=8YFLogxK
U2 - 10.1109/LGRS.2024.3484748
DO - 10.1109/LGRS.2024.3484748
M3 - Article
AN - SCOPUS:85207444213
SN - 1545-598X
VL - 21
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
M1 - 7002205
ER -