TY - JOUR
T1 - Bring KANs to Tracker
T2 - A Nonlinear Fusion Strategy Over Frequency Decoupling for Temporal Hyperspectral Object Tracking
AU - Wang, Hanzheng
AU - Li, Wei
AU - Xia, Xiang Gen
AU - Cui, Bolun
AU - Shi, Zhicheng
AU - Lin, Hongyang
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Hyperspectral (HS) cameras have great potential in extracting spectral, textual, and temporal information from objects. Many existing works leverage HS data for object tracking, as it provides unique spectral features that can help address challenges like background clutter (BC) or camouflage. However, most of these methods overlook the rich temporal information available in video sequences, and many spectral–visual fusion approaches fail to extract contextual information from a global perspective, causing an insufficient understanding of an entire object by the model. To address the above issues, a spectral–temporal tracking Transformer based on a frequency-domain fusion strategy (S3T-FFS) is proposed. First, a spectral–temporal token is proposed to capture an object’s spectral information that remains unchanged in video clips, providing additional tracking cues. Second, to extract spectral semantic information, we propose a frequency-domain fusion strategy (FFS), including a frequency attention network (FAN) and a Kolmogorov–Arnold network-based convolutional unit (CuKAN), to provide spectral information for the tracking model. Specifically, FAN is designed for the simultaneous extraction and fusion of spectral features. This synchronous modeling approach decouples low-frequency and high-frequency information, adjusting their balance through frequency-domain prior knowledge and self-adaptive weights. This allows the fusion network to focus more on global information. To further extract global patterns from the low-frequency features and improve the network’s interpretability, we introduce CuKAN to extract nonlinear relationships within the decoupled frequency components, and its learnable activation function helps the model learn global patterns, thus avoiding the local overfitting in convolutional neural networks. Extensive experiments on multiple large-scale datasets illustrate the effectiveness of our proposed methods.
AB - Hyperspectral (HS) cameras have great potential in extracting spectral, textual, and temporal information from objects. Many existing works leverage HS data for object tracking, as it provides unique spectral features that can help address challenges like background clutter (BC) or camouflage. However, most of these methods overlook the rich temporal information available in video sequences, and many spectral–visual fusion approaches fail to extract contextual information from a global perspective, causing an insufficient understanding of an entire object by the model. To address the above issues, a spectral–temporal tracking Transformer based on a frequency-domain fusion strategy (S3T-FFS) is proposed. First, a spectral–temporal token is proposed to capture an object’s spectral information that remains unchanged in video clips, providing additional tracking cues. Second, to extract spectral semantic information, we propose a frequency-domain fusion strategy (FFS), including a frequency attention network (FAN) and a Kolmogorov–Arnold network-based convolutional unit (CuKAN), to provide spectral information for the tracking model. Specifically, FAN is designed for the simultaneous extraction and fusion of spectral features. This synchronous modeling approach decouples low-frequency and high-frequency information, adjusting their balance through frequency-domain prior knowledge and self-adaptive weights. This allows the fusion network to focus more on global information. To further extract global patterns from the low-frequency features and improve the network’s interpretability, we introduce CuKAN to extract nonlinear relationships within the decoupled frequency components, and its learnable activation function helps the model learn global patterns, thus avoiding the local overfitting in convolutional neural networks. Extensive experiments on multiple large-scale datasets illustrate the effectiveness of our proposed methods.
KW - Deep learning
KW - Kolmogorov–Arnold network
KW - frequency attention
KW - hyperspectral object tracking (HOT)
UR - https://www.scopus.com/pages/publications/105028903928
U2 - 10.1109/TGRS.2026.3658509
DO - 10.1109/TGRS.2026.3658509
M3 - Article
AN - SCOPUS:105028903928
SN - 0196-2892
VL - 64
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5502714
ER -