TY - JOUR
T1 - U-Shaped CNN-ViT Siamese Network with Learnable Mask Guidance for Remote Sensing Building Change Detection
AU - Cui, Yongjing
AU - Chen, He
AU - Dong, Shan
AU - Wang, Guanqun
AU - Zhuang, Yin
N1 - Publisher Copyright:
© 2008-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Building change detection (BCD) aims to identify new or disappeared buildings from bitemporal images. However, the varied scales and appearances of buildings, along with the challenge of pseudochange interference from complex backgrounds, make it difficult to accurately extract complete changes. To address these challenges in BCD, a U-shaped hybrid Siamese network combining a convolutional neural network and a vision transformer (CNN-ViT) with learnable mask guidance, called U-Conformer, is designed. First a new hybrid architecture of U-Conformer is proposed. The architecture integrates the strengths of CNNs and ViTs to establish a robust, multiscale heterogeneous representation that aids in detecting buildings of various sizes. Second, a learnable mask guidance module is specifically designed for U-Conformer, focusing the multiscale heterogeneous representation on extracting relevant scale changes while progressively suppressing pseudochanges. Furthermore, for the U-Conformer architecture, a mask information joint class-balanced loss function that combines the binary cross-entropy loss function and the dice loss function is devised, significantly mitigating the issue of class imbalance. Experimental results on three publicly available change detection datasets, LEVIR-CD, WHU-CD, and GZ-CD, demonstrate that U-Conformer surpasses previous methods, achieving F1 scores of 91.5%, 94.6%, and 86.7%, as well as IoU scores of 84.3%, 89.7%, and 76.5% on the LEVIR-CD, WHU-CD, and GZ-CD datasets, respectively.
AB - Building change detection (BCD) aims to identify new or disappeared buildings from bitemporal images. However, the varied scales and appearances of buildings, along with the challenge of pseudochange interference from complex backgrounds, make it difficult to accurately extract complete changes. To address these challenges in BCD, a U-shaped hybrid Siamese network combining a convolutional neural network and a vision transformer (CNN-ViT) with learnable mask guidance, called U-Conformer, is designed. First a new hybrid architecture of U-Conformer is proposed. The architecture integrates the strengths of CNNs and ViTs to establish a robust, multiscale heterogeneous representation that aids in detecting buildings of various sizes. Second, a learnable mask guidance module is specifically designed for U-Conformer, focusing the multiscale heterogeneous representation on extracting relevant scale changes while progressively suppressing pseudochanges. Furthermore, for the U-Conformer architecture, a mask information joint class-balanced loss function that combines the binary cross-entropy loss function and the dice loss function is devised, significantly mitigating the issue of class imbalance. Experimental results on three publicly available change detection datasets, LEVIR-CD, WHU-CD, and GZ-CD, demonstrate that U-Conformer surpasses previous methods, achieving F1 scores of 91.5%, 94.6%, and 86.7%, as well as IoU scores of 84.3%, 89.7%, and 76.5% on the LEVIR-CD, WHU-CD, and GZ-CD datasets, respectively.
KW - Building change detection (BCD)
KW - U-shaped convolutional-neural-network-vision-transformer (CNN-ViT)
KW - learnable mask guidance
KW - multiscale representation
KW - remote sensing
UR - http://www.scopus.com/inward/record.url?scp=85195365289&partnerID=8YFLogxK
U2 - 10.1109/JSTARS.2024.3408604
DO - 10.1109/JSTARS.2024.3408604
M3 - Article
AN - SCOPUS:85195365289
SN - 1939-1404
VL - 17
SP - 11402
EP - 11418
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -