TY - JOUR
T1 - Deep Multimodal Fusion Model for Building Structural Type Recognition Using Multisource Remote Sensing Images and Building-Related Knowledge
AU - Zhou, Yuhang
AU - Tan, Yihua
AU - Wen, Qi
AU - Wang, Wei
AU - Li, Lingling
AU - Li, Zhenxing
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Building structural type (BST) information is vital for seismic risk and vulnerability modeling. However, obtaining this kind of information is not a trivial task. The conventional method involves a labor-intensive and inefficient manual inspection process for each building. Nowadays, a few methods have explored to use remote sensing images and some building-related knowledge (BRK) to realize automated BST recognition. However, these methods have many limitations, such as insufficient mining of multimodal information and difficulty obtaining BRK, which hinders their promotion and practical use. To alleviate the shortcomings above, we propose a deep multimodal fusion model, which combines satellite optical remote sensing image, aerial synthetic aperture radar image, and BRK (roof type, color, and group pattern) obtained by domain experts to achieve accurate automatic reasoning of BSTs. Specifically, first, we use a pseudo-siamese network to extract the image feature. Second, a knowledge graph (KG) based on the BRK is constructed, and then, we use a graph attention network to extract the semantic feature from the KG. Third, we propose a novel multistage gated fusion mechanism to fuse the image and semantic feature. Our method's best overall accuracy and kappa coefficient on the dataset collected in the study area are 90.35% and 0.83, which outperforms a series of existing methods. Through our model, high-precision BST information can be obtained for earthquake disaster prevention, reduction, and emergency decision making.
AB - Building structural type (BST) information is vital for seismic risk and vulnerability modeling. However, obtaining this kind of information is not a trivial task. The conventional method involves a labor-intensive and inefficient manual inspection process for each building. Nowadays, a few methods have explored to use remote sensing images and some building-related knowledge (BRK) to realize automated BST recognition. However, these methods have many limitations, such as insufficient mining of multimodal information and difficulty obtaining BRK, which hinders their promotion and practical use. To alleviate the shortcomings above, we propose a deep multimodal fusion model, which combines satellite optical remote sensing image, aerial synthetic aperture radar image, and BRK (roof type, color, and group pattern) obtained by domain experts to achieve accurate automatic reasoning of BSTs. Specifically, first, we use a pseudo-siamese network to extract the image feature. Second, a knowledge graph (KG) based on the BRK is constructed, and then, we use a graph attention network to extract the semantic feature from the KG. Third, we propose a novel multistage gated fusion mechanism to fuse the image and semantic feature. Our method's best overall accuracy and kappa coefficient on the dataset collected in the study area are 90.35% and 0.83, which outperforms a series of existing methods. Through our model, high-precision BST information can be obtained for earthquake disaster prevention, reduction, and emergency decision making.
KW - Building structural types (BSTs)
KW - knowledge graph (KG)
KW - multimodal fusion
KW - remote sensing
UR - http://www.scopus.com/inward/record.url?scp=85174859366&partnerID=8YFLogxK
U2 - 10.1109/JSTARS.2023.3323484
DO - 10.1109/JSTARS.2023.3323484
M3 - Article
AN - SCOPUS:85174859366
SN - 1939-1404
VL - 16
SP - 10073
EP - 10087
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -