Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery

Shan Dong; Yin Zhuang; He Chen; Tong Zhang; Lianlin Li; Teng Long

doi:10.1109/TGRS.2022.3225144

Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery

Shan Dong, Yin Zhuang, He Chen^*, Tong Zhang, Lianlin Li, Teng Long

^*此作品的通讯作者

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

Recently, semantic segmentation technology has been a research hotspot in optical remote sensing urban use classification. However, because of coupled semantic relations in very high-resolution and complex urban scenes, a more effective semantic description for pixelwise urban use interpretation has become a challenge. Then, aiming to set up a more effective semantic description, the effective receptive field (ERF) is analyzed in general convolutional neural networks. The unreasonable ERF distribution in the stacked convolutional layers of the encoder would lead to a large amound of small ERFs and fewer not large enough ERFs that form a naive semantic description in decoder. Therefore, in this article, a novel full semantic constructed network (FSCNet) is proposed to improve the naive semantic description and set up an effective semantic description. First, to avoid noise from shallow feature layers, a residual refinement convolution is designed to optimize the full-scale skip connections based on the U-shaped encoder-decoder. Second, an interscale fusion module is newly designed for multiscale feature fusion, which can generate three initial semantic modalities that are prepared for redefining the full semantic description. Third, a multiscale local context spatial attention module and boundary supervision are designed for an initial shallow semantic modality to capture the pure boundary information, and then, pyramid spatial pooling is employed for an initial deep semantic modality to further enlarge the ERF and obtain more abstract global information. Next, a self-calibration convolution combined with the atrous spatial pyramid pooling is designed to rectify and enrich an initial middle semantic modality, which can improve the naive semantic description and bridge the semantic gap between the redefined shallow and deep semantic modalities to advance the full semantic feature fusion. Finally, extensive experiments are carried out on three benchmarks (e.g., ISPRS Vaihingen, Potsdam, and DLRSD), and comparative results show that the proposed FSCNet can get remarkable performance compared to state-of-the-art (SOTA) methods. Besides, the code is available at https://github.com/DorisCV/FSCNet.

源语言	英语
文章编号	5606820
期刊	IEEE Transactions on Geoscience and Remote Sensing
卷	61
DOI	https://doi.org/10.1109/TGRS.2022.3225144
出版状态	已出版 - 2023

访问文件

10.1109/TGRS.2022.3225144

其它文件与链接

链接到 Scopus 的出版物

引用此

Dong, S., Zhuang, Y., Chen, H., Zhang, T., Li, L., & Long, T. (2023). Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing, 61, 文章 5606820. https://doi.org/10.1109/TGRS.2022.3225144

@article{779ca1b2fe4e41cda07142772d863db1,

title = "Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery",

abstract = "Recently, semantic segmentation technology has been a research hotspot in optical remote sensing urban use classification. However, because of coupled semantic relations in very high-resolution and complex urban scenes, a more effective semantic description for pixelwise urban use interpretation has become a challenge. Then, aiming to set up a more effective semantic description, the effective receptive field (ERF) is analyzed in general convolutional neural networks. The unreasonable ERF distribution in the stacked convolutional layers of the encoder would lead to a large amound of small ERFs and fewer not large enough ERFs that form a naive semantic description in decoder. Therefore, in this article, a novel full semantic constructed network (FSCNet) is proposed to improve the naive semantic description and set up an effective semantic description. First, to avoid noise from shallow feature layers, a residual refinement convolution is designed to optimize the full-scale skip connections based on the U-shaped encoder-decoder. Second, an interscale fusion module is newly designed for multiscale feature fusion, which can generate three initial semantic modalities that are prepared for redefining the full semantic description. Third, a multiscale local context spatial attention module and boundary supervision are designed for an initial shallow semantic modality to capture the pure boundary information, and then, pyramid spatial pooling is employed for an initial deep semantic modality to further enlarge the ERF and obtain more abstract global information. Next, a self-calibration convolution combined with the atrous spatial pyramid pooling is designed to rectify and enrich an initial middle semantic modality, which can improve the naive semantic description and bridge the semantic gap between the redefined shallow and deep semantic modalities to advance the full semantic feature fusion. Finally, extensive experiments are carried out on three benchmarks (e.g., ISPRS Vaihingen, Potsdam, and DLRSD), and comparative results show that the proposed FSCNet can get remarkable performance compared to state-of-the-art (SOTA) methods. Besides, the code is available at https://github.com/DorisCV/FSCNet.",

keywords = "Full semantic description, optical remote sensing, urban use classification, very high resolution (VHR)",

author = "Shan Dong and Yin Zhuang and He Chen and Tong Zhang and Lianlin Li and Teng Long",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2022.3225144",

language = "English",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery

AU - Dong, Shan

AU - Zhuang, Yin

AU - Chen, He

AU - Zhang, Tong

AU - Li, Lianlin

AU - Long, Teng

PY - 2023

Y1 - 2023

N2 - Recently, semantic segmentation technology has been a research hotspot in optical remote sensing urban use classification. However, because of coupled semantic relations in very high-resolution and complex urban scenes, a more effective semantic description for pixelwise urban use interpretation has become a challenge. Then, aiming to set up a more effective semantic description, the effective receptive field (ERF) is analyzed in general convolutional neural networks. The unreasonable ERF distribution in the stacked convolutional layers of the encoder would lead to a large amound of small ERFs and fewer not large enough ERFs that form a naive semantic description in decoder. Therefore, in this article, a novel full semantic constructed network (FSCNet) is proposed to improve the naive semantic description and set up an effective semantic description. First, to avoid noise from shallow feature layers, a residual refinement convolution is designed to optimize the full-scale skip connections based on the U-shaped encoder-decoder. Second, an interscale fusion module is newly designed for multiscale feature fusion, which can generate three initial semantic modalities that are prepared for redefining the full semantic description. Third, a multiscale local context spatial attention module and boundary supervision are designed for an initial shallow semantic modality to capture the pure boundary information, and then, pyramid spatial pooling is employed for an initial deep semantic modality to further enlarge the ERF and obtain more abstract global information. Next, a self-calibration convolution combined with the atrous spatial pyramid pooling is designed to rectify and enrich an initial middle semantic modality, which can improve the naive semantic description and bridge the semantic gap between the redefined shallow and deep semantic modalities to advance the full semantic feature fusion. Finally, extensive experiments are carried out on three benchmarks (e.g., ISPRS Vaihingen, Potsdam, and DLRSD), and comparative results show that the proposed FSCNet can get remarkable performance compared to state-of-the-art (SOTA) methods. Besides, the code is available at https://github.com/DorisCV/FSCNet.

AB - Recently, semantic segmentation technology has been a research hotspot in optical remote sensing urban use classification. However, because of coupled semantic relations in very high-resolution and complex urban scenes, a more effective semantic description for pixelwise urban use interpretation has become a challenge. Then, aiming to set up a more effective semantic description, the effective receptive field (ERF) is analyzed in general convolutional neural networks. The unreasonable ERF distribution in the stacked convolutional layers of the encoder would lead to a large amound of small ERFs and fewer not large enough ERFs that form a naive semantic description in decoder. Therefore, in this article, a novel full semantic constructed network (FSCNet) is proposed to improve the naive semantic description and set up an effective semantic description. First, to avoid noise from shallow feature layers, a residual refinement convolution is designed to optimize the full-scale skip connections based on the U-shaped encoder-decoder. Second, an interscale fusion module is newly designed for multiscale feature fusion, which can generate three initial semantic modalities that are prepared for redefining the full semantic description. Third, a multiscale local context spatial attention module and boundary supervision are designed for an initial shallow semantic modality to capture the pure boundary information, and then, pyramid spatial pooling is employed for an initial deep semantic modality to further enlarge the ERF and obtain more abstract global information. Next, a self-calibration convolution combined with the atrous spatial pyramid pooling is designed to rectify and enrich an initial middle semantic modality, which can improve the naive semantic description and bridge the semantic gap between the redefined shallow and deep semantic modalities to advance the full semantic feature fusion. Finally, extensive experiments are carried out on three benchmarks (e.g., ISPRS Vaihingen, Potsdam, and DLRSD), and comparative results show that the proposed FSCNet can get remarkable performance compared to state-of-the-art (SOTA) methods. Besides, the code is available at https://github.com/DorisCV/FSCNet.

KW - Full semantic description

KW - optical remote sensing

KW - urban use classification

KW - very high resolution (VHR)

UR - http://www.scopus.com/inward/record.url?scp=85144076232&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2022.3225144

DO - 10.1109/TGRS.2022.3225144

M3 - Article

AN - SCOPUS:85144076232

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5606820

ER -

Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery

摘要

访问文件

其它文件与链接

指纹

引用此