TY - JOUR
T1 - Spectral–Spatial Transformer Network for Hyperspectral Image Classification
T2 - A Factorized Architecture Search Framework
AU - Zhong, Zilong
AU - Li, Ying
AU - Ma, Lingfei
AU - Li, Jonathan
AU - Zheng, Wei Shi
N1 - Publisher Copyright:
1558-0644 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
PY - 2022
Y1 - 2022
N2 - Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral–spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral–spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.
AB - Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral–spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral–spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.
KW - Computer architecture
KW - Convolution
KW - Hyperspectral imaging
KW - Kernel
KW - Task analysis
KW - Training
KW - Transformers
UR - http://www.scopus.com/inward/record.url?scp=85117083995&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2021.3115699
DO - 10.1109/TGRS.2021.3115699
M3 - Article
AN - SCOPUS:85117083995
SN - 0196-2892
VL - 60
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
ER -