Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

Weiquan Liu; Cheng Wang; Xuesheng Bian; Shuting Chen; Shangshu Yu; Xiuhong Lin; Shang Hong Lai; Dongdong Weng; Jonathan Li

doi:10.1109/LGRS.2019.2949351

Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

Weiquan Liu, Cheng Wang^*, Xuesheng Bian, Shuting Chen, Shangshu Yu, Xiuhong Lin, Shang Hong Lai, Dongdong Weng, Jonathan Li

^*Corresponding author for this work

School of Optics and Photonics

Research output: Contribution to journal › Article › peer-review

8 Citations (Scopus)

Abstract

Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.

Original language	English
Article number	8894486
Pages (from-to)	1608-1612
Number of pages	5
Journal	IEEE Geoscience and Remote Sensing Letters
Volume	17
Issue number	9
DOIs	https://doi.org/10.1109/LGRS.2019.2949351
Publication status	Published - Sept 2020

Keywords

Attention mechanism
Siamese network
augmented reality (AR)
cross-domain image patch matching
virtual-real registration

Access to Document

10.1109/LGRS.2019.2949351

Cite this

@article{1555e4d9049748479c6a3fde4db3870a,

title = "Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism",

abstract = "Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.",

keywords = "Attention mechanism, Siamese network, augmented reality (AR), cross-domain image patch matching, virtual-real registration",

author = "Weiquan Liu and Cheng Wang and Xuesheng Bian and Shuting Chen and Shangshu Yu and Xiuhong Lin and Lai, {Shang Hong} and Dongdong Weng and Jonathan Li",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2020",

month = sep,

doi = "10.1109/LGRS.2019.2949351",

language = "English",

volume = "17",

pages = "1608--1612",

journal = "IEEE Geoscience and Remote Sensing Letters",

issn = "1545-598X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

AU - Liu, Weiquan

AU - Wang, Cheng

AU - Bian, Xuesheng

AU - Chen, Shuting

AU - Yu, Shangshu

AU - Lin, Xiuhong

AU - Lai, Shang Hong

AU - Weng, Dongdong

AU - Li, Jonathan

PY - 2020/9

Y1 - 2020/9

N2 - Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.

AB - Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.

KW - Attention mechanism

KW - Siamese network

KW - augmented reality (AR)

KW - cross-domain image patch matching

KW - virtual-real registration

UR - http://www.scopus.com/inward/record.url?scp=85085373012&partnerID=8YFLogxK

U2 - 10.1109/LGRS.2019.2949351

DO - 10.1109/LGRS.2019.2949351

M3 - Article

AN - SCOPUS:85085373012

SN - 1545-598X

VL - 17

SP - 1608

EP - 1612

JO - IEEE Geoscience and Remote Sensing Letters

JF - IEEE Geoscience and Remote Sensing Letters

IS - 9

M1 - 8894486

ER -

Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this