CSTSUNet: A Cross Swin Transformer-Based Siamese U-Shape Network for Change Detection in Remote Sensing Images

Yaping Wu; Lu Li; Nan Wang; Wei Li; Junfang Fan; Ran Tao; Xin Wen; Yanfeng Wang

doi:10.1109/TGRS.2023.3326813

CSTSUNet: A Cross Swin Transformer-Based Siamese U-Shape Network for Change Detection in Remote Sensing Images

Yaping Wu, Lu Li^*, Nan Wang, Wei Li, Junfang Fan, Ran Tao, Xin Wen, Yanfeng Wang

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

8 引用（Scopus）

摘要

Change detection (CD) in remote sensing (RS) images is a critical task that has achieved significant success by deep learning. Current networks often employ pixel-based differencing, proportion, classification-based, or feature concatenation methods to represent changes of interest. However, these methods fail to effectively detect the desired changes, as they are highly sensitive to factors such as atmospheric conditions, lighting variations, and phenological variations, resulting in detection errors. Inspired by the transformer structure, we adopt a cross-attention mechanism to more robustly extract feature differences between bitemporal images. The motivation of the method is based on the assumption that if there is no change between image pairs, the semantic features from one temporal image can well be represented by the semantic features from another temporal image. Conversely if there is a change, there are significant reconstruction errors. Therefore, a Cross Swin transformer-based Siamese U-shaped network namely CSTSUNet is proposed for RS CD. CSTSUnet consists of encoder, difference feature extraction, and decoder. The encoder is based on a hierarchical residual network (ResNet) with the Siamese U-net structure, allowing parallel processing of bitemporal images and extraction of multiscale features. The difference feature extraction consists of four difference feature extraction modules that compute difference feature at multiple scales. In this module, Cross Swin transformer is employed in each difference feature extraction module to communicate the information of bitemporal images. The decoder takes in the multiscale difference features as input, injects details and boundaries iteratively level by level, and makes the change map more and more accurate. We conduct experiments on three public datasets, and the experimental results demonstrate that the proposed CSTSUNet outperforms other state-of-the-art methods in terms of both qualitative and quantitative analyses. Our code is available at https://github.com/l7170/CSTSUNet.git.

源语言	英语
文章编号	5623715
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	61
DOI	https://doi.org/10.1109/TGRS.2023.3326813
出版状态	已出版 - 2023

访问文件

10.1109/TGRS.2023.3326813

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d9b6c160fd19466394311ced050329f9,

title = "CSTSUNet: A Cross Swin Transformer-Based Siamese U-Shape Network for Change Detection in Remote Sensing Images",

abstract = "Change detection (CD) in remote sensing (RS) images is a critical task that has achieved significant success by deep learning. Current networks often employ pixel-based differencing, proportion, classification-based, or feature concatenation methods to represent changes of interest. However, these methods fail to effectively detect the desired changes, as they are highly sensitive to factors such as atmospheric conditions, lighting variations, and phenological variations, resulting in detection errors. Inspired by the transformer structure, we adopt a cross-attention mechanism to more robustly extract feature differences between bitemporal images. The motivation of the method is based on the assumption that if there is no change between image pairs, the semantic features from one temporal image can well be represented by the semantic features from another temporal image. Conversely if there is a change, there are significant reconstruction errors. Therefore, a Cross Swin transformer-based Siamese U-shaped network namely CSTSUNet is proposed for RS CD. CSTSUnet consists of encoder, difference feature extraction, and decoder. The encoder is based on a hierarchical residual network (ResNet) with the Siamese U-net structure, allowing parallel processing of bitemporal images and extraction of multiscale features. The difference feature extraction consists of four difference feature extraction modules that compute difference feature at multiple scales. In this module, Cross Swin transformer is employed in each difference feature extraction module to communicate the information of bitemporal images. The decoder takes in the multiscale difference features as input, injects details and boundaries iteratively level by level, and makes the change map more and more accurate. We conduct experiments on three public datasets, and the experimental results demonstrate that the proposed CSTSUNet outperforms other state-of-the-art methods in terms of both qualitative and quantitative analyses. Our code is available at https://github.com/l7170/CSTSUNet.git.",

keywords = "Change detection (CD), deep learning, remote sensing (RS) image, transformer",

author = "Yaping Wu and Lu Li and Nan Wang and Wei Li and Junfang Fan and Ran Tao and Xin Wen and Yanfeng Wang",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2023.3326813",

language = "English",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - CSTSUNet

T2 - A Cross Swin Transformer-Based Siamese U-Shape Network for Change Detection in Remote Sensing Images

AU - Wu, Yaping

AU - Li, Lu

AU - Wang, Nan

AU - Li, Wei

AU - Fan, Junfang

AU - Tao, Ran

AU - Wen, Xin

AU - Wang, Yanfeng

PY - 2023

Y1 - 2023

N2 - Change detection (CD) in remote sensing (RS) images is a critical task that has achieved significant success by deep learning. Current networks often employ pixel-based differencing, proportion, classification-based, or feature concatenation methods to represent changes of interest. However, these methods fail to effectively detect the desired changes, as they are highly sensitive to factors such as atmospheric conditions, lighting variations, and phenological variations, resulting in detection errors. Inspired by the transformer structure, we adopt a cross-attention mechanism to more robustly extract feature differences between bitemporal images. The motivation of the method is based on the assumption that if there is no change between image pairs, the semantic features from one temporal image can well be represented by the semantic features from another temporal image. Conversely if there is a change, there are significant reconstruction errors. Therefore, a Cross Swin transformer-based Siamese U-shaped network namely CSTSUNet is proposed for RS CD. CSTSUnet consists of encoder, difference feature extraction, and decoder. The encoder is based on a hierarchical residual network (ResNet) with the Siamese U-net structure, allowing parallel processing of bitemporal images and extraction of multiscale features. The difference feature extraction consists of four difference feature extraction modules that compute difference feature at multiple scales. In this module, Cross Swin transformer is employed in each difference feature extraction module to communicate the information of bitemporal images. The decoder takes in the multiscale difference features as input, injects details and boundaries iteratively level by level, and makes the change map more and more accurate. We conduct experiments on three public datasets, and the experimental results demonstrate that the proposed CSTSUNet outperforms other state-of-the-art methods in terms of both qualitative and quantitative analyses. Our code is available at https://github.com/l7170/CSTSUNet.git.

AB - Change detection (CD) in remote sensing (RS) images is a critical task that has achieved significant success by deep learning. Current networks often employ pixel-based differencing, proportion, classification-based, or feature concatenation methods to represent changes of interest. However, these methods fail to effectively detect the desired changes, as they are highly sensitive to factors such as atmospheric conditions, lighting variations, and phenological variations, resulting in detection errors. Inspired by the transformer structure, we adopt a cross-attention mechanism to more robustly extract feature differences between bitemporal images. The motivation of the method is based on the assumption that if there is no change between image pairs, the semantic features from one temporal image can well be represented by the semantic features from another temporal image. Conversely if there is a change, there are significant reconstruction errors. Therefore, a Cross Swin transformer-based Siamese U-shaped network namely CSTSUNet is proposed for RS CD. CSTSUnet consists of encoder, difference feature extraction, and decoder. The encoder is based on a hierarchical residual network (ResNet) with the Siamese U-net structure, allowing parallel processing of bitemporal images and extraction of multiscale features. The difference feature extraction consists of four difference feature extraction modules that compute difference feature at multiple scales. In this module, Cross Swin transformer is employed in each difference feature extraction module to communicate the information of bitemporal images. The decoder takes in the multiscale difference features as input, injects details and boundaries iteratively level by level, and makes the change map more and more accurate. We conduct experiments on three public datasets, and the experimental results demonstrate that the proposed CSTSUNet outperforms other state-of-the-art methods in terms of both qualitative and quantitative analyses. Our code is available at https://github.com/l7170/CSTSUNet.git.

KW - Change detection (CD)

KW - deep learning

KW - remote sensing (RS) image

KW - transformer

UR - http://www.scopus.com/inward/record.url?scp=85176300304&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3326813

DO - 10.1109/TGRS.2023.3326813

M3 - Article

AN - SCOPUS:85176300304

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5623715

ER -

CSTSUNet: A Cross Swin Transformer-Based Siamese U-Shape Network for Change Detection in Remote Sensing Images

摘要

访问文件

其它文件与链接

指纹

引用此