TY - JOUR
T1 - A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images
AU - Sun, Yu
AU - Bi, Fukun
AU - Gao, Yangte
AU - Chen, Liang
AU - Feng, Suting
N1 - Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2022/5
Y1 - 2022/5
N2 - In recent years, with the development of deep learning, semantic segmentation for remote sensing images has gradually become a hot issue in computer vision. However, segmentation for multicategory targets is still a difficult problem. To address the issues regarding poor precision and multiple scales in different categories, we propose a UNet, based on multi-attention (MA-UNet). Specifically, we propose a residual encoder, based on a simple attention module, to improve the extraction capability of the backbone for fine-grained features. By using multi-head self-attention for the lowest level feature, the semantic representation of the given feature map is reconstructed, further implementing fine-grained segmentation for different categories of pixels. Then, to address the problem of multiple scales in different categories, we increase the number of down-sampling to subdivide the feature sizes of the target at different scales, and use channel attention and spatial attention in different feature fusion stages, to better fuse the feature information of the target at different scales. We conducted experiments on the WHDLD datasets and DLRSD datasets. The results show that, with multiple visual attention feature enhancements, our method achieves 63.94% mean intersection over union (IOU) on the WHDLD datasets; this result is 4.27% higher than that of UNet, and on the DLRSD datasets, the mean IOU of our methods improves UNet’s 56.17% to 61.90%, while exceeding those of other advanced methods. The implementation code is available on the following Github Link.
AB - In recent years, with the development of deep learning, semantic segmentation for remote sensing images has gradually become a hot issue in computer vision. However, segmentation for multicategory targets is still a difficult problem. To address the issues regarding poor precision and multiple scales in different categories, we propose a UNet, based on multi-attention (MA-UNet). Specifically, we propose a residual encoder, based on a simple attention module, to improve the extraction capability of the backbone for fine-grained features. By using multi-head self-attention for the lowest level feature, the semantic representation of the given feature map is reconstructed, further implementing fine-grained segmentation for different categories of pixels. Then, to address the problem of multiple scales in different categories, we increase the number of down-sampling to subdivide the feature sizes of the target at different scales, and use channel attention and spatial attention in different feature fusion stages, to better fuse the feature information of the target at different scales. We conducted experiments on the WHDLD datasets and DLRSD datasets. The results show that, with multiple visual attention feature enhancements, our method achieves 63.94% mean intersection over union (IOU) on the WHDLD datasets; this result is 4.27% higher than that of UNet, and on the DLRSD datasets, the mean IOU of our methods improves UNet’s 56.17% to 61.90%, while exceeding those of other advanced methods. The implementation code is available on the following Github Link.
KW - channel attention
KW - deep learning
KW - image segmentation
KW - multi-head self-attention
KW - remote sensing
KW - spatial attention
UR - http://www.scopus.com/inward/record.url?scp=85131783013&partnerID=8YFLogxK
U2 - 10.3390/sym14050906
DO - 10.3390/sym14050906
M3 - Article
AN - SCOPUS:85131783013
SN - 2073-8994
VL - 14
JO - Symmetry
JF - Symmetry
IS - 5
M1 - 906
ER -