TY - JOUR
T1 - Full Semantic Constructed Network for Urban Use Classification From Very High-Resolution Optical Remote Sensing Imagery
AU - Dong, Shan
AU - Zhuang, Yin
AU - Chen, He
AU - Zhang, Tong
AU - Li, Lianlin
AU - Long, Teng
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, semantic segmentation technology has been a research hotspot in optical remote sensing urban use classification. However, because of coupled semantic relations in very high-resolution and complex urban scenes, a more effective semantic description for pixelwise urban use interpretation has become a challenge. Then, aiming to set up a more effective semantic description, the effective receptive field (ERF) is analyzed in general convolutional neural networks. The unreasonable ERF distribution in the stacked convolutional layers of the encoder would lead to a large amound of small ERFs and fewer not large enough ERFs that form a naive semantic description in decoder. Therefore, in this article, a novel full semantic constructed network (FSCNet) is proposed to improve the naive semantic description and set up an effective semantic description. First, to avoid noise from shallow feature layers, a residual refinement convolution is designed to optimize the full-scale skip connections based on the U-shaped encoder-decoder. Second, an interscale fusion module is newly designed for multiscale feature fusion, which can generate three initial semantic modalities that are prepared for redefining the full semantic description. Third, a multiscale local context spatial attention module and boundary supervision are designed for an initial shallow semantic modality to capture the pure boundary information, and then, pyramid spatial pooling is employed for an initial deep semantic modality to further enlarge the ERF and obtain more abstract global information. Next, a self-calibration convolution combined with the atrous spatial pyramid pooling is designed to rectify and enrich an initial middle semantic modality, which can improve the naive semantic description and bridge the semantic gap between the redefined shallow and deep semantic modalities to advance the full semantic feature fusion. Finally, extensive experiments are carried out on three benchmarks (e.g., ISPRS Vaihingen, Potsdam, and DLRSD), and comparative results show that the proposed FSCNet can get remarkable performance compared to state-of-the-art (SOTA) methods. Besides, the code is available at https://github.com/DorisCV/FSCNet.
AB - Recently, semantic segmentation technology has been a research hotspot in optical remote sensing urban use classification. However, because of coupled semantic relations in very high-resolution and complex urban scenes, a more effective semantic description for pixelwise urban use interpretation has become a challenge. Then, aiming to set up a more effective semantic description, the effective receptive field (ERF) is analyzed in general convolutional neural networks. The unreasonable ERF distribution in the stacked convolutional layers of the encoder would lead to a large amound of small ERFs and fewer not large enough ERFs that form a naive semantic description in decoder. Therefore, in this article, a novel full semantic constructed network (FSCNet) is proposed to improve the naive semantic description and set up an effective semantic description. First, to avoid noise from shallow feature layers, a residual refinement convolution is designed to optimize the full-scale skip connections based on the U-shaped encoder-decoder. Second, an interscale fusion module is newly designed for multiscale feature fusion, which can generate three initial semantic modalities that are prepared for redefining the full semantic description. Third, a multiscale local context spatial attention module and boundary supervision are designed for an initial shallow semantic modality to capture the pure boundary information, and then, pyramid spatial pooling is employed for an initial deep semantic modality to further enlarge the ERF and obtain more abstract global information. Next, a self-calibration convolution combined with the atrous spatial pyramid pooling is designed to rectify and enrich an initial middle semantic modality, which can improve the naive semantic description and bridge the semantic gap between the redefined shallow and deep semantic modalities to advance the full semantic feature fusion. Finally, extensive experiments are carried out on three benchmarks (e.g., ISPRS Vaihingen, Potsdam, and DLRSD), and comparative results show that the proposed FSCNet can get remarkable performance compared to state-of-the-art (SOTA) methods. Besides, the code is available at https://github.com/DorisCV/FSCNet.
KW - Full semantic description
KW - optical remote sensing
KW - urban use classification
KW - very high resolution (VHR)
UR - http://www.scopus.com/inward/record.url?scp=85144076232&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2022.3225144
DO - 10.1109/TGRS.2022.3225144
M3 - Article
AN - SCOPUS:85144076232
SN - 0196-2892
VL - 61
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5606820
ER -