TY - JOUR
T1 - Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures
AU - Zhao, Qisen
AU - Dong, Liquan
AU - Liu, Ming
AU - Kong, Lingqin
AU - Chu, Xuhong
AU - Hui, Mei
AU - Zhao, Yuejin
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Visible (VIS) and infrared image registration is a challenging problem in computer vision due to the significant differences in appearance and physical properties between the two modalities. A single feature is not enough to remove nonlinear differences, and the matching method faces a trade-off between the high-resolution feature map and the transformer model. In this article, we propose a novel method called adaptive-neighborhood contextual multifeatures (ANCM-Net) for VIS/infrared image registration. Our method addresses the limitations of existing approaches by incorporating depth features and cross-modal similar contour features to form contextual feature representations. Additionally, we propose a region-spanning adaptive cross-attention module to handle low spatial resolution and redundancy in attention computation. This module enables attentional encoding of limited information in the attention location and cross-modal adaptive region through attention region adjustment. In the matching task, we compute an adaptive attention region for each pixel point in the cross-modal image and encode and match the depth features and edge features together. As a result, ANCM-Net not only preserves the long-range dependency of the image feature structure but also achieves fine-grained attention between highly correlated pixels. By extracting cross-modal consistent contextual features to compensate for modality-specific information, our approach improves the cross-modal matching performance. Extensive experiments on real-world captured thermal infrared (TIR) and VIS datasets demonstrate that ANCM-Net outperforms existing image matching methods.
AB - Visible (VIS) and infrared image registration is a challenging problem in computer vision due to the significant differences in appearance and physical properties between the two modalities. A single feature is not enough to remove nonlinear differences, and the matching method faces a trade-off between the high-resolution feature map and the transformer model. In this article, we propose a novel method called adaptive-neighborhood contextual multifeatures (ANCM-Net) for VIS/infrared image registration. Our method addresses the limitations of existing approaches by incorporating depth features and cross-modal similar contour features to form contextual feature representations. Additionally, we propose a region-spanning adaptive cross-attention module to handle low spatial resolution and redundancy in attention computation. This module enables attentional encoding of limited information in the attention location and cross-modal adaptive region through attention region adjustment. In the matching task, we compute an adaptive attention region for each pixel point in the cross-modal image and encode and match the depth features and edge features together. As a result, ANCM-Net not only preserves the long-range dependency of the image feature structure but also achieves fine-grained attention between highly correlated pixels. By extracting cross-modal consistent contextual features to compensate for modality-specific information, our approach improves the cross-modal matching performance. Extensive experiments on real-world captured thermal infrared (TIR) and VIS datasets demonstrate that ANCM-Net outperforms existing image matching methods.
KW - Adaptive-neighborhood
KW - contextual multifeatures
KW - cross-modality matching
KW - image matching
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85189611347&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3385088
DO - 10.1109/TGRS.2024.3385088
M3 - Article
AN - SCOPUS:85189611347
SN - 0196-2892
VL - 62
SP - 1
EP - 17
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5002717
ER -