TY - JOUR
T1 - ISTASTrack
T2 - Bridging ANN and SNN via ISTA Adapter for RGB-Event Tracking
AU - Liu, Siying
AU - Wang, Zikai
AU - Zheng, Hanle
AU - Hu, Yifan
AU - Wang, Xilin
AU - Yang, Qingkai
AU - Wu, Jibin
AU - Guo, Hao
AU - Deng, Lei
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - RGB-Event tracking has become a promising trend in visual object tracking to leverage the complementary strengths of both RGB images and dynamic spike events for improved performance. However, existing artificial neural networks (ANNs) struggle to fully exploit the sparse and asynchronous nature of event streams. Recent efforts toward hybrid architectures combining ANNs and spiking neural networks (SNNs) have emerged as a promising solution in RGB-Event perception, yet effectively fusing features across heterogeneous paradigms remains a challenge. In this work, we propose ISTASTrack, the first transformer-based ANN-SNN hybrid Tracker equipped with ISTA adapters for RGB-Event tracking. The two-branch model employs a vision transformer to extract spatial context from RGB inputs and a spiking transformer to capture spatio-temporal dynamics from event streams. To bridge the modality and paradigm gap between ANN and SNN features, we systematically design an ISTA adapter for bidirectional feature interaction between the two branches. The ISTA adapter is derived from the sparse representation theory by unfolding the iterative shrinkage-thresholding algorithm. Additionally, we incorporate a temporal downsampling attention module within the adapter to align multi-step SNN features with single-step ANN features in the latent space. Experimental results on RGB-Event tracking benchmarks, such as FE240hz, VisEvent, COESOT, and FELT, have demonstrated that ISTASTrack achieves state-of-the-art performance while maintaining high energy efficiency. This work highlights the effectiveness and practicality of hybrid ANN-SNN designs for robust visual tracking.
AB - RGB-Event tracking has become a promising trend in visual object tracking to leverage the complementary strengths of both RGB images and dynamic spike events for improved performance. However, existing artificial neural networks (ANNs) struggle to fully exploit the sparse and asynchronous nature of event streams. Recent efforts toward hybrid architectures combining ANNs and spiking neural networks (SNNs) have emerged as a promising solution in RGB-Event perception, yet effectively fusing features across heterogeneous paradigms remains a challenge. In this work, we propose ISTASTrack, the first transformer-based ANN-SNN hybrid Tracker equipped with ISTA adapters for RGB-Event tracking. The two-branch model employs a vision transformer to extract spatial context from RGB inputs and a spiking transformer to capture spatio-temporal dynamics from event streams. To bridge the modality and paradigm gap between ANN and SNN features, we systematically design an ISTA adapter for bidirectional feature interaction between the two branches. The ISTA adapter is derived from the sparse representation theory by unfolding the iterative shrinkage-thresholding algorithm. Additionally, we incorporate a temporal downsampling attention module within the adapter to align multi-step SNN features with single-step ANN features in the latent space. Experimental results on RGB-Event tracking benchmarks, such as FE240hz, VisEvent, COESOT, and FELT, have demonstrated that ISTASTrack achieves state-of-the-art performance while maintaining high energy efficiency. This work highlights the effectiveness and practicality of hybrid ANN-SNN designs for robust visual tracking.
KW - Hybrid neural networks
KW - multimodal object tracking
KW - RGB-Event fusion
KW - sparse representation
KW - spiking neural networks
UR - https://www.scopus.com/pages/publications/105039884167
U2 - 10.1109/TIP.2026.3694138
DO - 10.1109/TIP.2026.3694138
M3 - Article
AN - SCOPUS:105039884167
SN - 1057-7149
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -