Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures

Qisen Zhao; Liquan Dong; Ming Liu; Lingqin Kong; Xuhong Chu; Mei Hui; Yuejin Zhao

doi:10.1109/TGRS.2024.3385088

Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures

Qisen Zhao, Liquan Dong^*, Ming Liu, Lingqin Kong, Xuhong Chu, Mei Hui, Yuejin Zhao

^*Corresponding author for this work

School of Optics and Photonics

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Visible (VIS) and infrared image registration is a challenging problem in computer vision due to the significant differences in appearance and physical properties between the two modalities. A single feature is not enough to remove nonlinear differences, and the matching method faces a trade-off between the high-resolution feature map and the transformer model. In this article, we propose a novel method called adaptive-neighborhood contextual multifeatures (ANCM-Net) for VIS/infrared image registration. Our method addresses the limitations of existing approaches by incorporating depth features and cross-modal similar contour features to form contextual feature representations. Additionally, we propose a region-spanning adaptive cross-attention module to handle low spatial resolution and redundancy in attention computation. This module enables attentional encoding of limited information in the attention location and cross-modal adaptive region through attention region adjustment. In the matching task, we compute an adaptive attention region for each pixel point in the cross-modal image and encode and match the depth features and edge features together. As a result, ANCM-Net not only preserves the long-range dependency of the image feature structure but also achieves fine-grained attention between highly correlated pixels. By extracting cross-modal consistent contextual features to compensate for modality-specific information, our approach improves the cross-modal matching performance. Extensive experiments on real-world captured thermal infrared (TIR) and VIS datasets demonstrate that ANCM-Net outperforms existing image matching methods.

Original language	English
Article number	5002717
Pages (from-to)	1-17
Number of pages	17
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	62
DOIs	https://doi.org/10.1109/TGRS.2024.3385088
Publication status	Published - 2024

Keywords

Adaptive-neighborhood
contextual multifeatures
cross-modality matching
image matching
transformer

Access to Document

10.1109/TGRS.2024.3385088

Cite this

Zhao, Q., Dong, L., Liu, M., Kong, L., Chu, X., Hui, M., & Zhao, Y. (2024). Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures. IEEE Transactions on Geoscience and Remote Sensing, 62, 1-17. Article 5002717. https://doi.org/10.1109/TGRS.2024.3385088

@article{3edf5c2bbfeb4d86a26ce0bdd3db5247,

title = "Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures",

abstract = "Visible (VIS) and infrared image registration is a challenging problem in computer vision due to the significant differences in appearance and physical properties between the two modalities. A single feature is not enough to remove nonlinear differences, and the matching method faces a trade-off between the high-resolution feature map and the transformer model. In this article, we propose a novel method called adaptive-neighborhood contextual multifeatures (ANCM-Net) for VIS/infrared image registration. Our method addresses the limitations of existing approaches by incorporating depth features and cross-modal similar contour features to form contextual feature representations. Additionally, we propose a region-spanning adaptive cross-attention module to handle low spatial resolution and redundancy in attention computation. This module enables attentional encoding of limited information in the attention location and cross-modal adaptive region through attention region adjustment. In the matching task, we compute an adaptive attention region for each pixel point in the cross-modal image and encode and match the depth features and edge features together. As a result, ANCM-Net not only preserves the long-range dependency of the image feature structure but also achieves fine-grained attention between highly correlated pixels. By extracting cross-modal consistent contextual features to compensate for modality-specific information, our approach improves the cross-modal matching performance. Extensive experiments on real-world captured thermal infrared (TIR) and VIS datasets demonstrate that ANCM-Net outperforms existing image matching methods.",

keywords = "Adaptive-neighborhood, contextual multifeatures, cross-modality matching, image matching, transformer",

author = "Qisen Zhao and Liquan Dong and Ming Liu and Lingqin Kong and Xuhong Chu and Mei Hui and Yuejin Zhao",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2024",

doi = "10.1109/TGRS.2024.3385088",

language = "English",

volume = "62",

pages = "1--17",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures

AU - Zhao, Qisen

AU - Dong, Liquan

AU - Liu, Ming

AU - Kong, Lingqin

AU - Chu, Xuhong

AU - Hui, Mei

AU - Zhao, Yuejin

PY - 2024

Y1 - 2024

N2 - Visible (VIS) and infrared image registration is a challenging problem in computer vision due to the significant differences in appearance and physical properties between the two modalities. A single feature is not enough to remove nonlinear differences, and the matching method faces a trade-off between the high-resolution feature map and the transformer model. In this article, we propose a novel method called adaptive-neighborhood contextual multifeatures (ANCM-Net) for VIS/infrared image registration. Our method addresses the limitations of existing approaches by incorporating depth features and cross-modal similar contour features to form contextual feature representations. Additionally, we propose a region-spanning adaptive cross-attention module to handle low spatial resolution and redundancy in attention computation. This module enables attentional encoding of limited information in the attention location and cross-modal adaptive region through attention region adjustment. In the matching task, we compute an adaptive attention region for each pixel point in the cross-modal image and encode and match the depth features and edge features together. As a result, ANCM-Net not only preserves the long-range dependency of the image feature structure but also achieves fine-grained attention between highly correlated pixels. By extracting cross-modal consistent contextual features to compensate for modality-specific information, our approach improves the cross-modal matching performance. Extensive experiments on real-world captured thermal infrared (TIR) and VIS datasets demonstrate that ANCM-Net outperforms existing image matching methods.

AB - Visible (VIS) and infrared image registration is a challenging problem in computer vision due to the significant differences in appearance and physical properties between the two modalities. A single feature is not enough to remove nonlinear differences, and the matching method faces a trade-off between the high-resolution feature map and the transformer model. In this article, we propose a novel method called adaptive-neighborhood contextual multifeatures (ANCM-Net) for VIS/infrared image registration. Our method addresses the limitations of existing approaches by incorporating depth features and cross-modal similar contour features to form contextual feature representations. Additionally, we propose a region-spanning adaptive cross-attention module to handle low spatial resolution and redundancy in attention computation. This module enables attentional encoding of limited information in the attention location and cross-modal adaptive region through attention region adjustment. In the matching task, we compute an adaptive attention region for each pixel point in the cross-modal image and encode and match the depth features and edge features together. As a result, ANCM-Net not only preserves the long-range dependency of the image feature structure but also achieves fine-grained attention between highly correlated pixels. By extracting cross-modal consistent contextual features to compensate for modality-specific information, our approach improves the cross-modal matching performance. Extensive experiments on real-world captured thermal infrared (TIR) and VIS datasets demonstrate that ANCM-Net outperforms existing image matching methods.

KW - Adaptive-neighborhood

KW - contextual multifeatures

KW - cross-modality matching

KW - image matching

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85189611347&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2024.3385088

DO - 10.1109/TGRS.2024.3385088

M3 - Article

AN - SCOPUS:85189611347

SN - 0196-2892

VL - 62

SP - 1

EP - 17

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5002717

ER -

Visible/Infrared Image Registration Based on Region-Adaptive Contextual Multifeatures

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this