Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion-Based Transformer Network for Remote Sensing Image Super-Resolution

Yuting Lu; Lingtong Min; Binglu Wang; Le Zheng; Xiaoxu Wang; Yongqiang Zhao; Le Yang; Teng Long

doi:10.1109/TGRS.2023.3334490

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion-Based Transformer Network for Remote Sensing Image Super-Resolution

Yuting Lu, Lingtong Min, Binglu Wang, Le Zheng, Xiaoxu Wang^*, Yongqiang Zhao, Le Yang^*, Teng Long

^*Corresponding author for this work

School of Information and Electronics

Research output: Contribution to journal › Article › peer-review

2 Citations (Scopus)

Abstract

Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial details and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-Attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transformer-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called cross-spatial pixel integration and cross-stage feature fusion-based transformer network (SPIFFNet) for RSISR. Our proposed model effectively enhances context cognition and understanding of the entire image, facilitating efficient integration of cross-stage features. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-The-Art methods. Our code is available at https://github.com/Dr-Lyt/SPIFFNet.

Original language	English
Article number	5625616
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	61
DOIs	https://doi.org/10.1109/TGRS.2023.3334490
Publication status	Published - 2023

Keywords

Cross-spatial pixel integration
cross-stage feature fusion
remote sensing image super-resolution (RSISR)
transformer network

Access to Document

10.1109/TGRS.2023.3334490

Cite this

@article{75e31882a57546dda61f73253e778286,

title = "Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion-Based Transformer Network for Remote Sensing Image Super-Resolution",

abstract = "Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial details and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-Attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transformer-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called cross-spatial pixel integration and cross-stage feature fusion-based transformer network (SPIFFNet) for RSISR. Our proposed model effectively enhances context cognition and understanding of the entire image, facilitating efficient integration of cross-stage features. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-The-Art methods. Our code is available at https://github.com/Dr-Lyt/SPIFFNet.",

keywords = "Cross-spatial pixel integration, cross-stage feature fusion, remote sensing image super-resolution (RSISR), transformer network",

author = "Yuting Lu and Lingtong Min and Binglu Wang and Le Zheng and Xiaoxu Wang and Yongqiang Zhao and Le Yang and Teng Long",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2023.3334490",

language = "English",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion-Based Transformer Network for Remote Sensing Image Super-Resolution

AU - Lu, Yuting

AU - Min, Lingtong

AU - Wang, Binglu

AU - Zheng, Le

AU - Wang, Xiaoxu

AU - Zhao, Yongqiang

AU - Yang, Le

AU - Long, Teng

PY - 2023

Y1 - 2023

N2 - Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial details and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-Attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transformer-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called cross-spatial pixel integration and cross-stage feature fusion-based transformer network (SPIFFNet) for RSISR. Our proposed model effectively enhances context cognition and understanding of the entire image, facilitating efficient integration of cross-stage features. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-The-Art methods. Our code is available at https://github.com/Dr-Lyt/SPIFFNet.

AB - Remote sensing image super-resolution (RSISR) plays a vital role in enhancing spatial details and improving the quality of satellite imagery. Recently, Transformer-based models have shown competitive performance in RSISR. To mitigate the quadratic computational complexity resulting from global self-Attention, various methods constrain attention to a local window, enhancing its efficiency. Consequently, the receptive fields in a single attention layer are inadequate, leading to insufficient context modeling. Furthermore, while most transformer-based approaches reuse shallow features through skip connections, relying solely on these connections treats shallow and deep features equally, impeding the model's ability to characterize them. To address these issues, we propose a novel transformer architecture called cross-spatial pixel integration and cross-stage feature fusion-based transformer network (SPIFFNet) for RSISR. Our proposed model effectively enhances context cognition and understanding of the entire image, facilitating efficient integration of cross-stage features. The model incorporates cross-spatial pixel integration attention (CSPIA) to introduce contextual information into a local window, while cross-stage feature fusion attention (CSFFA) adaptively fuses features from the previous stage to improve feature expression in line with the requirements of the current stage. We conducted comprehensive experiments on multiple benchmark datasets, demonstrating the superior performance of our proposed SPIFFNet in terms of both quantitative metrics and visual quality when compared to state-of-The-Art methods. Our code is available at https://github.com/Dr-Lyt/SPIFFNet.

KW - Cross-spatial pixel integration

KW - cross-stage feature fusion

KW - remote sensing image super-resolution (RSISR)

KW - transformer network

UR - http://www.scopus.com/inward/record.url?scp=85178067683&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2023.3334490

DO - 10.1109/TGRS.2023.3334490

M3 - Article

AN - SCOPUS:85178067683

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5625616

ER -

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion-Based Transformer Network for Remote Sensing Image Super-Resolution

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this