TY - JOUR
T1 - Single-frame Infrared Small Target Detection with Dynamic Multi-dimensional Convolution
AU - Zhou, Shichao
AU - Zhang, Zekai
AU - Zhao, Yingrui
AU - Wang, Wenzheng
AU - Wang, Zhuowei
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Mainly resulting from remote imaging, the target of interest in infrared imagery tends to occupy very few pixels with faint radiation value. The absence of discriminative spatial features of infrared small targets challenges traditional singleframe detectors that rely on handcrafted filter engineering to amplify local contrast. Recently, emerging Deep Convolutional Networks (DCNs) based detectors employ elaborate multi-scale spatial contexts representation to "semantically reason"the small and dim infrared target in pixel-level. However, the multiple spatial convolution-downsampling operation adopted by such leading methods could cause the loss of target appearance information during the initial feature encoding stage. To further enhance the low-level feature representation capacity, we advocate the insight of traditional matching filter, and propose a novel pixeladaptive convolution kernel modulated by multi-dimensional contexts (i.e., Dynamic Multi-dimensional Convolution, DMConv). Precisely, the DMConv is refined by three collaborative and indispensable attention functions that focus on spatial layout, channel, and kernel number of convolution kernel respectively, so as to effectively mine, highlight, and enrich fine-grained spatial features with moderate computational burden. Extensive experiments conducted on two real-world infrared single-frame image datasets, i.e., SIRST and IRSTD-1k, favourably demonstrate the effectiveness of the proposed method and obtain consistent performance improvements over other state-of-the-art detectors.
AB - Mainly resulting from remote imaging, the target of interest in infrared imagery tends to occupy very few pixels with faint radiation value. The absence of discriminative spatial features of infrared small targets challenges traditional singleframe detectors that rely on handcrafted filter engineering to amplify local contrast. Recently, emerging Deep Convolutional Networks (DCNs) based detectors employ elaborate multi-scale spatial contexts representation to "semantically reason"the small and dim infrared target in pixel-level. However, the multiple spatial convolution-downsampling operation adopted by such leading methods could cause the loss of target appearance information during the initial feature encoding stage. To further enhance the low-level feature representation capacity, we advocate the insight of traditional matching filter, and propose a novel pixeladaptive convolution kernel modulated by multi-dimensional contexts (i.e., Dynamic Multi-dimensional Convolution, DMConv). Precisely, the DMConv is refined by three collaborative and indispensable attention functions that focus on spatial layout, channel, and kernel number of convolution kernel respectively, so as to effectively mine, highlight, and enrich fine-grained spatial features with moderate computational burden. Extensive experiments conducted on two real-world infrared single-frame image datasets, i.e., SIRST and IRSTD-1k, favourably demonstrate the effectiveness of the proposed method and obtain consistent performance improvements over other state-of-the-art detectors.
KW - adaptive filters
KW - convolutional neural network
KW - Infrared image
KW - multi-dimensional information fusion
UR - http://www.scopus.com/inward/record.url?scp=105003391771&partnerID=8YFLogxK
U2 - 10.1109/LGRS.2025.3563588
DO - 10.1109/LGRS.2025.3563588
M3 - Article
AN - SCOPUS:105003391771
SN - 1545-598X
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
ER -