A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images

Yu Sun; Fukun Bi; Yangte Gao; Liang Chen; Suting Feng

doi:10.3390/sym14050906

A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images

Yu Sun, Fukun Bi^*, Yangte Gao, Liang Chen, Suting Feng

^*Corresponding author for this work

School of Information and Electronics

Research output: Contribution to journal › Article › peer-review

46 Citations (Scopus)

Abstract

In recent years, with the development of deep learning, semantic segmentation for remote sensing images has gradually become a hot issue in computer vision. However, segmentation for multicategory targets is still a difficult problem. To address the issues regarding poor precision and multiple scales in different categories, we propose a UNet, based on multi-attention (MA-UNet). Specifically, we propose a residual encoder, based on a simple attention module, to improve the extraction capability of the backbone for fine-grained features. By using multi-head self-attention for the lowest level feature, the semantic representation of the given feature map is reconstructed, further implementing fine-grained segmentation for different categories of pixels. Then, to address the problem of multiple scales in different categories, we increase the number of down-sampling to subdivide the feature sizes of the target at different scales, and use channel attention and spatial attention in different feature fusion stages, to better fuse the feature information of the target at different scales. We conducted experiments on the WHDLD datasets and DLRSD datasets. The results show that, with multiple visual attention feature enhancements, our method achieves 63.94% mean intersection over union (IOU) on the WHDLD datasets; this result is 4.27% higher than that of UNet, and on the DLRSD datasets, the mean IOU of our methods improves UNet’s 56.17% to 61.90%, while exceeding those of other advanced methods. The implementation code is available on the following Github Link.

Original language	English
Article number	906
Journal	Symmetry
Volume	14
Issue number	5
DOIs	https://doi.org/10.3390/sym14050906
Publication status	Published - May 2022

Keywords

channel attention
deep learning
image segmentation
multi-head self-attention
remote sensing
spatial attention

Access to Document

10.3390/sym14050906

Cite this

Sun, Y., Bi, F., Gao, Y., Chen, L., & Feng, S. (2022). A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images. Symmetry, 14(5), Article 906. https://doi.org/10.3390/sym14050906

@article{0895df36a1af47919cf5c810d5ea6cd9,

title = "A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images",

abstract = "In recent years, with the development of deep learning, semantic segmentation for remote sensing images has gradually become a hot issue in computer vision. However, segmentation for multicategory targets is still a difficult problem. To address the issues regarding poor precision and multiple scales in different categories, we propose a UNet, based on multi-attention (MA-UNet). Specifically, we propose a residual encoder, based on a simple attention module, to improve the extraction capability of the backbone for fine-grained features. By using multi-head self-attention for the lowest level feature, the semantic representation of the given feature map is reconstructed, further implementing fine-grained segmentation for different categories of pixels. Then, to address the problem of multiple scales in different categories, we increase the number of down-sampling to subdivide the feature sizes of the target at different scales, and use channel attention and spatial attention in different feature fusion stages, to better fuse the feature information of the target at different scales. We conducted experiments on the WHDLD datasets and DLRSD datasets. The results show that, with multiple visual attention feature enhancements, our method achieves 63.94% mean intersection over union (IOU) on the WHDLD datasets; this result is 4.27% higher than that of UNet, and on the DLRSD datasets, the mean IOU of our methods improves UNet{\textquoteright}s 56.17% to 61.90%, while exceeding those of other advanced methods. The implementation code is available on the following Github Link.",

keywords = "channel attention, deep learning, image segmentation, multi-head self-attention, remote sensing, spatial attention",

author = "Yu Sun and Fukun Bi and Yangte Gao and Liang Chen and Suting Feng",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = may,

doi = "10.3390/sym14050906",

language = "English",

volume = "14",

journal = "Symmetry",

issn = "2073-8994",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "5",

}

TY - JOUR

T1 - A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images

AU - Sun, Yu

AU - Bi, Fukun

AU - Gao, Yangte

AU - Chen, Liang

AU - Feng, Suting

PY - 2022/5

Y1 - 2022/5

N2 - In recent years, with the development of deep learning, semantic segmentation for remote sensing images has gradually become a hot issue in computer vision. However, segmentation for multicategory targets is still a difficult problem. To address the issues regarding poor precision and multiple scales in different categories, we propose a UNet, based on multi-attention (MA-UNet). Specifically, we propose a residual encoder, based on a simple attention module, to improve the extraction capability of the backbone for fine-grained features. By using multi-head self-attention for the lowest level feature, the semantic representation of the given feature map is reconstructed, further implementing fine-grained segmentation for different categories of pixels. Then, to address the problem of multiple scales in different categories, we increase the number of down-sampling to subdivide the feature sizes of the target at different scales, and use channel attention and spatial attention in different feature fusion stages, to better fuse the feature information of the target at different scales. We conducted experiments on the WHDLD datasets and DLRSD datasets. The results show that, with multiple visual attention feature enhancements, our method achieves 63.94% mean intersection over union (IOU) on the WHDLD datasets; this result is 4.27% higher than that of UNet, and on the DLRSD datasets, the mean IOU of our methods improves UNet’s 56.17% to 61.90%, while exceeding those of other advanced methods. The implementation code is available on the following Github Link.

AB - In recent years, with the development of deep learning, semantic segmentation for remote sensing images has gradually become a hot issue in computer vision. However, segmentation for multicategory targets is still a difficult problem. To address the issues regarding poor precision and multiple scales in different categories, we propose a UNet, based on multi-attention (MA-UNet). Specifically, we propose a residual encoder, based on a simple attention module, to improve the extraction capability of the backbone for fine-grained features. By using multi-head self-attention for the lowest level feature, the semantic representation of the given feature map is reconstructed, further implementing fine-grained segmentation for different categories of pixels. Then, to address the problem of multiple scales in different categories, we increase the number of down-sampling to subdivide the feature sizes of the target at different scales, and use channel attention and spatial attention in different feature fusion stages, to better fuse the feature information of the target at different scales. We conducted experiments on the WHDLD datasets and DLRSD datasets. The results show that, with multiple visual attention feature enhancements, our method achieves 63.94% mean intersection over union (IOU) on the WHDLD datasets; this result is 4.27% higher than that of UNet, and on the DLRSD datasets, the mean IOU of our methods improves UNet’s 56.17% to 61.90%, while exceeding those of other advanced methods. The implementation code is available on the following Github Link.

KW - channel attention

KW - deep learning

KW - image segmentation

KW - multi-head self-attention

KW - remote sensing

KW - spatial attention

UR - http://www.scopus.com/inward/record.url?scp=85131783013&partnerID=8YFLogxK

U2 - 10.3390/sym14050906

DO - 10.3390/sym14050906

M3 - Article

AN - SCOPUS:85131783013

SN - 2073-8994

VL - 14

JO - Symmetry

JF - Symmetry

IS - 5

M1 - 906

ER -

A Multi-Attention UNet for Semantic Segmentation in Remote Sensing Images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this