Deep Multimodal Fusion Model for Building Structural Type Recognition Using Multisource Remote Sensing Images and Building-Related Knowledge

Yuhang Zhou; Yihua Tan; Qi Wen; Wei Wang; Lingling Li; Zhenxing Li

doi:10.1109/JSTARS.2023.3323484

Deep Multimodal Fusion Model for Building Structural Type Recognition Using Multisource Remote Sensing Images and Building-Related Knowledge

Yuhang Zhou, Yihua Tan^*, Qi Wen^*, Wei Wang, Lingling Li, Zhenxing Li

^*此作品的通讯作者

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Building structural type (BST) information is vital for seismic risk and vulnerability modeling. However, obtaining this kind of information is not a trivial task. The conventional method involves a labor-intensive and inefficient manual inspection process for each building. Nowadays, a few methods have explored to use remote sensing images and some building-related knowledge (BRK) to realize automated BST recognition. However, these methods have many limitations, such as insufficient mining of multimodal information and difficulty obtaining BRK, which hinders their promotion and practical use. To alleviate the shortcomings above, we propose a deep multimodal fusion model, which combines satellite optical remote sensing image, aerial synthetic aperture radar image, and BRK (roof type, color, and group pattern) obtained by domain experts to achieve accurate automatic reasoning of BSTs. Specifically, first, we use a pseudo-siamese network to extract the image feature. Second, a knowledge graph (KG) based on the BRK is constructed, and then, we use a graph attention network to extract the semantic feature from the KG. Third, we propose a novel multistage gated fusion mechanism to fuse the image and semantic feature. Our method's best overall accuracy and kappa coefficient on the dataset collected in the study area are 90.35% and 0.83, which outperforms a series of existing methods. Through our model, high-precision BST information can be obtained for earthquake disaster prevention, reduction, and emergency decision making.

源语言	英语
页（从-至）	10073-10087
页数	15
期刊	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
卷	16
DOI	https://doi.org/10.1109/JSTARS.2023.3323484
出版状态	已出版 - 2023
已对外发布	是

访问文件

10.1109/JSTARS.2023.3323484

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d4702e0222de464e87eba2b5ace98aab,

title = "Deep Multimodal Fusion Model for Building Structural Type Recognition Using Multisource Remote Sensing Images and Building-Related Knowledge",

abstract = "Building structural type (BST) information is vital for seismic risk and vulnerability modeling. However, obtaining this kind of information is not a trivial task. The conventional method involves a labor-intensive and inefficient manual inspection process for each building. Nowadays, a few methods have explored to use remote sensing images and some building-related knowledge (BRK) to realize automated BST recognition. However, these methods have many limitations, such as insufficient mining of multimodal information and difficulty obtaining BRK, which hinders their promotion and practical use. To alleviate the shortcomings above, we propose a deep multimodal fusion model, which combines satellite optical remote sensing image, aerial synthetic aperture radar image, and BRK (roof type, color, and group pattern) obtained by domain experts to achieve accurate automatic reasoning of BSTs. Specifically, first, we use a pseudo-siamese network to extract the image feature. Second, a knowledge graph (KG) based on the BRK is constructed, and then, we use a graph attention network to extract the semantic feature from the KG. Third, we propose a novel multistage gated fusion mechanism to fuse the image and semantic feature. Our method's best overall accuracy and kappa coefficient on the dataset collected in the study area are 90.35% and 0.83, which outperforms a series of existing methods. Through our model, high-precision BST information can be obtained for earthquake disaster prevention, reduction, and emergency decision making.",

keywords = "Building structural types (BSTs), knowledge graph (KG), multimodal fusion, remote sensing",

author = "Yuhang Zhou and Yihua Tan and Qi Wen and Wei Wang and Lingling Li and Zhenxing Li",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.",

year = "2023",

doi = "10.1109/JSTARS.2023.3323484",

language = "English",

volume = "16",

pages = "10073--10087",

journal = "IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing",

issn = "1939-1404",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Deep Multimodal Fusion Model for Building Structural Type Recognition Using Multisource Remote Sensing Images and Building-Related Knowledge

AU - Zhou, Yuhang

AU - Tan, Yihua

AU - Wen, Qi

AU - Wang, Wei

AU - Li, Lingling

AU - Li, Zhenxing

PY - 2023

Y1 - 2023

N2 - Building structural type (BST) information is vital for seismic risk and vulnerability modeling. However, obtaining this kind of information is not a trivial task. The conventional method involves a labor-intensive and inefficient manual inspection process for each building. Nowadays, a few methods have explored to use remote sensing images and some building-related knowledge (BRK) to realize automated BST recognition. However, these methods have many limitations, such as insufficient mining of multimodal information and difficulty obtaining BRK, which hinders their promotion and practical use. To alleviate the shortcomings above, we propose a deep multimodal fusion model, which combines satellite optical remote sensing image, aerial synthetic aperture radar image, and BRK (roof type, color, and group pattern) obtained by domain experts to achieve accurate automatic reasoning of BSTs. Specifically, first, we use a pseudo-siamese network to extract the image feature. Second, a knowledge graph (KG) based on the BRK is constructed, and then, we use a graph attention network to extract the semantic feature from the KG. Third, we propose a novel multistage gated fusion mechanism to fuse the image and semantic feature. Our method's best overall accuracy and kappa coefficient on the dataset collected in the study area are 90.35% and 0.83, which outperforms a series of existing methods. Through our model, high-precision BST information can be obtained for earthquake disaster prevention, reduction, and emergency decision making.

AB - Building structural type (BST) information is vital for seismic risk and vulnerability modeling. However, obtaining this kind of information is not a trivial task. The conventional method involves a labor-intensive and inefficient manual inspection process for each building. Nowadays, a few methods have explored to use remote sensing images and some building-related knowledge (BRK) to realize automated BST recognition. However, these methods have many limitations, such as insufficient mining of multimodal information and difficulty obtaining BRK, which hinders their promotion and practical use. To alleviate the shortcomings above, we propose a deep multimodal fusion model, which combines satellite optical remote sensing image, aerial synthetic aperture radar image, and BRK (roof type, color, and group pattern) obtained by domain experts to achieve accurate automatic reasoning of BSTs. Specifically, first, we use a pseudo-siamese network to extract the image feature. Second, a knowledge graph (KG) based on the BRK is constructed, and then, we use a graph attention network to extract the semantic feature from the KG. Third, we propose a novel multistage gated fusion mechanism to fuse the image and semantic feature. Our method's best overall accuracy and kappa coefficient on the dataset collected in the study area are 90.35% and 0.83, which outperforms a series of existing methods. Through our model, high-precision BST information can be obtained for earthquake disaster prevention, reduction, and emergency decision making.

KW - Building structural types (BSTs)

KW - knowledge graph (KG)

KW - multimodal fusion

KW - remote sensing

UR - http://www.scopus.com/inward/record.url?scp=85174859366&partnerID=8YFLogxK

U2 - 10.1109/JSTARS.2023.3323484

DO - 10.1109/JSTARS.2023.3323484

M3 - Article

AN - SCOPUS:85174859366

SN - 1939-1404

VL - 16

SP - 10073

EP - 10087

JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ER -

Deep Multimodal Fusion Model for Building Structural Type Recognition Using Multisource Remote Sensing Images and Building-Related Knowledge

摘要

访问文件

其它文件与链接

指纹

引用此