Spectral–Spatial Transformer Network for Hyperspectral Image Classification: A Factorized Architecture Search Framework

Zilong Zhong; Ying Li; Lingfei Ma; Jonathan Li; Wei Shi Zheng

doi:10.1109/TGRS.2021.3115699

Spectral–Spatial Transformer Network for Hyperspectral Image Classification: A Factorized Architecture Search Framework

Zilong Zhong, Ying Li, Lingfei Ma, Jonathan Li, Wei Shi Zheng^*

^*Corresponding author for this work

School of Mechanical Engineering

Research output: Contribution to journal › Article › peer-review

187 Citations (Scopus)

Abstract

Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral–spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral–spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.

Original language	English
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	60
DOIs	https://doi.org/10.1109/TGRS.2021.3115699
Publication status	Published - 2022

Keywords

Computer architecture
Convolution
Hyperspectral imaging
Kernel
Task analysis
Training
Transformers

Access to Document

10.1109/TGRS.2021.3115699

Cite this

@article{cdae30aa1aa84219aebe7db62d134ae3,

title = "Spectral–Spatial Transformer Network for Hyperspectral Image Classification: A Factorized Architecture Search Framework",

abstract = "Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral–spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral–spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.",

keywords = "Computer architecture, Convolution, Hyperspectral imaging, Kernel, Task analysis, Training, Transformers",

author = "Zilong Zhong and Ying Li and Lingfei Ma and Jonathan Li and Zheng, {Wei Shi}",

note = "Publisher Copyright: 1558-0644 {\textcopyright} 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.",

year = "2022",

doi = "10.1109/TGRS.2021.3115699",

language = "English",

volume = "60",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Spectral–Spatial Transformer Network for Hyperspectral Image Classification

T2 - A Factorized Architecture Search Framework

AU - Zhong, Zilong

AU - Li, Ying

AU - Ma, Lingfei

AU - Li, Jonathan

AU - Zheng, Wei Shi

N1 - Publisher Copyright: 1558-0644 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

PY - 2022

Y1 - 2022

N2 - Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral–spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral–spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.

AB - Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral–spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral–spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.

KW - Computer architecture

KW - Convolution

KW - Hyperspectral imaging

KW - Kernel

KW - Task analysis

KW - Training

KW - Transformers

UR - http://www.scopus.com/inward/record.url?scp=85117083995&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2021.3115699

DO - 10.1109/TGRS.2021.3115699

M3 - Article

AN - SCOPUS:85117083995

SN - 0196-2892

VL - 60

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

ER -

Spectral–Spatial Transformer Network for Hyperspectral Image Classification: A Factorized Architecture Search Framework

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this