Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

Weiquan Liu; Cheng Wang; Xuesheng Bian; Shuting Chen; Shangshu Yu; Xiuhong Lin; Shang Hong Lai; Dongdong Weng; Jonathan Li

doi:10.1109/LGRS.2019.2949351

Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

Weiquan Liu, Cheng Wang^*, Xuesheng Bian, Shuting Chen, Shangshu Yu, Xiuhong Lin, Shang Hong Lai, Dongdong Weng, Jonathan Li

^*此作品的通讯作者

光电学院

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.

源语言	英语
文章编号	8894486
页（从-至）	1608-1612
页数	5
期刊	IEEE Geoscience and Remote Sensing Letters
卷	17
期	9
DOI	https://doi.org/10.1109/LGRS.2019.2949351
出版状态	已出版 - 9月 2020

访问文件

10.1109/LGRS.2019.2949351

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{1555e4d9049748479c6a3fde4db3870a,

title = "Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism",

abstract = "Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.",

keywords = "Attention mechanism, Siamese network, augmented reality (AR), cross-domain image patch matching, virtual-real registration",

author = "Weiquan Liu and Cheng Wang and Xuesheng Bian and Shuting Chen and Shangshu Yu and Xiuhong Lin and Lai, {Shang Hong} and Dongdong Weng and Jonathan Li",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2020",

month = sep,

doi = "10.1109/LGRS.2019.2949351",

language = "English",

volume = "17",

pages = "1608--1612",

journal = "IEEE Geoscience and Remote Sensing Letters",

issn = "1545-598X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

AU - Liu, Weiquan

AU - Wang, Cheng

AU - Bian, Xuesheng

AU - Chen, Shuting

AU - Yu, Shangshu

AU - Lin, Xiuhong

AU - Lai, Shang Hong

AU - Weng, Dongdong

AU - Li, Jonathan

PY - 2020/9

Y1 - 2020/9

N2 - Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.

AB - Different domain image sensors or imaging mechanisms provide cross-domain images when sensing the same scene. There is a domain shift between cross-domain images so that the image gap between different domains is the major challenge for measuring the similarity of the feature descriptors extracted from different domain images. Specifically, matching ground camera images and unmanned aerial vehicle (UAV) 3-D model-rendered images, which are two kinds of extremely challenging cross-domain images, is a way to establish indirectly the spatial relationship between 2-D and 3-D spaces. This provides a solution for the virtual-real registration of augmented reality (AR) in outdoor environments. However, during matching, handcrafted descriptors and existing learning-based feature descriptors limit the rendered images. In this letter, first, to learn robust and invariant 128-D local feature descriptors for ground camera and rendered images, we present a novel network structure, SiamAM-Net, which embeds the autoencoders with an attention mechanism into the Siamese network. Then, to narrow the gap between the cross-domain images during the optimizing of SiamAM-Net, we design an adaptive margin for the loss function. Finally, we match the ground camera-rendered images by using the learned local feature descriptors and explore the outdoor AR virtual-real registration. Experiments show that the local feature descriptors, learned by SiamAM-Net, are robust and achieve state-of-the-art retrieval performance on the cross-domain image data set of ground camera and rendered images. In addition, several outdoor AR applications also demonstrate the usefulness of the proposed outdoor AR virtual-real registration.

KW - Attention mechanism

KW - Siamese network

KW - augmented reality (AR)

KW - cross-domain image patch matching

KW - virtual-real registration

UR - http://www.scopus.com/inward/record.url?scp=85085373012&partnerID=8YFLogxK

U2 - 10.1109/LGRS.2019.2949351

DO - 10.1109/LGRS.2019.2949351

M3 - Article

AN - SCOPUS:85085373012

SN - 1545-598X

VL - 17

SP - 1608

EP - 1612

JO - IEEE Geoscience and Remote Sensing Letters

JF - IEEE Geoscience and Remote Sensing Letters

IS - 9

M1 - 8894486

ER -

Learning to Match Ground Camera Image and UAV 3-D Model-Rendered Image Based on Siamese Network with Attention Mechanism

摘要

访问文件

其它文件与链接

指纹

引用此