Abstract
Scene classification has become an active research area in remote sensing (RS) image interpretation. Recently, Transformer-based methods have shown great potential in modeling global semantic information and have been exploited in RS scene classification. In this letter, we propose a multi-level fusion Swin Transformer (MFST), which integrates a multi-level feature merging (MFM) module and an adaptive feature compression (AFC) module to further boost the performance for RS scene classification. The MFM module narrows the semantic gaps in multi-level features via patch merging in lower-level feature maps and lateral connections in the top-down pathway. The AFC module makes multi-level features have smaller dimensions and more coherent semantic information by adaptive channel reduction. We evaluate the proposed network on the aerial image dataset (AID) and NWPU-RESISC45 (NWPU) datasets, and the classification results reveal that the proposed network outperforms several state-of-the-art (SOTA) methods.
Original language | English |
---|---|
Article number | 6516005 |
Journal | IEEE Geoscience and Remote Sensing Letters |
Volume | 19 |
DOIs | |
Publication status | Published - 2022 |
Keywords
- Feature fusion
- Transformer
- multi-level
- remote sensing (RS) scene classification