TY - JOUR
T1 - Cross-Scale Mixing Attention for Multisource Remote Sensing Data Fusion and Classification
AU - Gao, Yunhao
AU - Zhang, Mengmeng
AU - Wang, Junjie
AU - Li, Wei
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Hyperspectral and multispectral images (HS/MS) fusion and classification as an important branch of data quality improvement and interpretation have attracted increasing attention in recent years. However, the unavailable sensor prior still limits the performance of many traditional fusion methods, consequently deteriorating the classification results. Despite the unsupervised methods based on convolutional neural network (CNN) making a lot of attempts to mitigate the limitations, challenges with extracting the long-range dependencies hamper the performance. To address these impediments, a transformer-based baseline constructed by the cross-scale mixing attention transformer (CSMFormer) is designed for HS/MS fusion and classification. Especially, the spatial-spectral mixer (SSMixer) is utilized to extract the long-range dependencies at a large scale. Simultaneously, cross-scale feature calibration is achieved by combining information from the original scale. After that, the nonlinear enhancement module (NLEM) is designed to encourage feature discrimination. Note that the spatial and spectral mixers can be replaced by any spatial-spectral feature extractors. Therefore, the proposed CSMFormer is flexible in data fusion, land-covers' classification, segmentation, and so on. Experiments about data fusion and land-covers' classification on two HS/MS wetland remote sensing scenes demonstrate the superiority of the proposed CSMFormer baseline, improving the data quality and classification precision.
AB - Hyperspectral and multispectral images (HS/MS) fusion and classification as an important branch of data quality improvement and interpretation have attracted increasing attention in recent years. However, the unavailable sensor prior still limits the performance of many traditional fusion methods, consequently deteriorating the classification results. Despite the unsupervised methods based on convolutional neural network (CNN) making a lot of attempts to mitigate the limitations, challenges with extracting the long-range dependencies hamper the performance. To address these impediments, a transformer-based baseline constructed by the cross-scale mixing attention transformer (CSMFormer) is designed for HS/MS fusion and classification. Especially, the spatial-spectral mixer (SSMixer) is utilized to extract the long-range dependencies at a large scale. Simultaneously, cross-scale feature calibration is achieved by combining information from the original scale. After that, the nonlinear enhancement module (NLEM) is designed to encourage feature discrimination. Note that the spatial and spectral mixers can be replaced by any spatial-spectral feature extractors. Therefore, the proposed CSMFormer is flexible in data fusion, land-covers' classification, segmentation, and so on. Experiments about data fusion and land-covers' classification on two HS/MS wetland remote sensing scenes demonstrate the superiority of the proposed CSMFormer baseline, improving the data quality and classification precision.
KW - Cross-scale mixing attention transformer (CSMFormer)
KW - data fusion
KW - hyperspectral and multispectral images (HS/MS)
KW - land-covers' classification
KW - long-range dependencies
UR - http://www.scopus.com/inward/record.url?scp=85151517523&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2023.3263362
DO - 10.1109/TGRS.2023.3263362
M3 - Article
AN - SCOPUS:85151517523
SN - 0196-2892
VL - 61
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5507815
ER -