TY - JOUR
T1 - HR and LiDAR Data Collaborative Semantic Segmentation Based on Adaptive Cross-Modal Fusion Network
AU - Ye, Zhen
AU - Li, Zhen
AU - Wang, Nan
AU - Li, Yuan
AU - Li, Wei
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024
Y1 - 2024
N2 - Semantic segmentation using cross-modal data is a hot topic in the field of Earth observation. Compared with single-modal strategies, cross-modal networks fuse multiaspect information and yield higher segmentation accuracy, which is widely used in urban planning, environmental monitoring and so on. In this study, an end-to-end adaptive cross-modal fusion network (ACFNet) is proposed for semantic segmentation task using high resolution and light detection and ranging images, because of the difference of sensor resolution, different modal data have different abilities of ground object expression. Therefore, multimodal data fusion should consider the features with different spatial scales, while most existing methods simply use the same spatial scale features for fusion. In this work, we first design an adaptive scale fusion module that can automatically choose the features with optimal spatial scales, making full use of the representation properties of ground object details. Second, the important feature guidance module is designed, which can evaluate the influence weights of deep semantic features and shallow spatial detailed features, achieving adaptive deep and shallow feature fusion, and reducing the semantic-spatial information dilution caused by layer-by-layer up and down sampling. Finally, we introduce a divide Fourier context learning (DFCL) module to transform the feature maps from spatial domain to frequency domain. Compared to the limited perception of current spatial convolution kernels, the DFCL module can easily model the contextual dependencies of cross-modal features, which will improve the segmentaion accuracy for complex ground objects of cities, especially for occlusion. To demonstrate the generalisation performance of our module, we conduct extensive experiments and ablation studies on three datasets: Potsdam, Vaihingen, and IEEE GRSS DFC 2018. Results show that the proposed ACFNet is effective in semantic segmentation.
AB - Semantic segmentation using cross-modal data is a hot topic in the field of Earth observation. Compared with single-modal strategies, cross-modal networks fuse multiaspect information and yield higher segmentation accuracy, which is widely used in urban planning, environmental monitoring and so on. In this study, an end-to-end adaptive cross-modal fusion network (ACFNet) is proposed for semantic segmentation task using high resolution and light detection and ranging images, because of the difference of sensor resolution, different modal data have different abilities of ground object expression. Therefore, multimodal data fusion should consider the features with different spatial scales, while most existing methods simply use the same spatial scale features for fusion. In this work, we first design an adaptive scale fusion module that can automatically choose the features with optimal spatial scales, making full use of the representation properties of ground object details. Second, the important feature guidance module is designed, which can evaluate the influence weights of deep semantic features and shallow spatial detailed features, achieving adaptive deep and shallow feature fusion, and reducing the semantic-spatial information dilution caused by layer-by-layer up and down sampling. Finally, we introduce a divide Fourier context learning (DFCL) module to transform the feature maps from spatial domain to frequency domain. Compared to the limited perception of current spatial convolution kernels, the DFCL module can easily model the contextual dependencies of cross-modal features, which will improve the segmentaion accuracy for complex ground objects of cities, especially for occlusion. To demonstrate the generalisation performance of our module, we conduct extensive experiments and ablation studies on three datasets: Potsdam, Vaihingen, and IEEE GRSS DFC 2018. Results show that the proposed ACFNet is effective in semantic segmentation.
KW - Adaptive learning
KW - aerial imagery
KW - cross-modal fusion
KW - semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85197635497&partnerID=8YFLogxK
U2 - 10.1109/JSTARS.2024.3418387
DO - 10.1109/JSTARS.2024.3418387
M3 - Article
AN - SCOPUS:85197635497
SN - 1939-1404
VL - 17
SP - 12153
EP - 12168
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -