Abstract
RGB-Event tracking has become a promising trend in visual object tracking to leverage the complementary strengths of both RGB images and dynamic spike events for improved performance. However, existing artificial neural networks (ANNs) struggle to fully exploit the sparse and asynchronous nature of event streams. Recent efforts toward hybrid architectures combining ANNs and spiking neural networks (SNNs) have emerged as a promising solution in RGB-Event perception, yet effectively fusing features across heterogeneous paradigms remains a challenge. In this work, we propose ISTASTrack, the first transformer-based ANN-SNN hybrid Tracker equipped with ISTA adapters for RGB-Event tracking. The two-branch model employs a vision transformer to extract spatial context from RGB inputs and a spiking transformer to capture spatio-temporal dynamics from event streams. To bridge the modality and paradigm gap between ANN and SNN features, we systematically design an ISTA adapter for bidirectional feature interaction between the two branches. The ISTA adapter is derived from the sparse representation theory by unfolding the iterative shrinkage-thresholding algorithm. Additionally, we incorporate a temporal downsampling attention module within the adapter to align multi-step SNN features with single-step ANN features in the latent space. Experimental results on RGB-Event tracking benchmarks, such as FE240hz, VisEvent, COESOT, and FELT, have demonstrated that ISTASTrack achieves state-of-the-art performance while maintaining high energy efficiency. This work highlights the effectiveness and practicality of hybrid ANN-SNN designs for robust visual tracking. The code is publicly available at https://github.com/lsying009/ISTASTrack.git.
| Original language | English |
|---|---|
| Pages (from-to) | 5423-5438 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Image Processing |
| Volume | 35 |
| DOIs | |
| Publication status | Published - 2026 |
| Externally published | Yes |
Keywords
- Hybrid neural networks
- RGB-event fusion
- multimodal object tracking
- sparse representation
- spiking neural networks
Fingerprint
Dive into the research topics of 'ISTASTrack: Bridging ANN and SNN via ISTA Adapter for RGB-Event Tracking'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver