Siamese Visual Tracking with Multi-Parallel Interactive Transformers

Wuwei Wang, Meibo Lv*, Lin Zhu, Tuo Han, Yi Zhang, Yuanqing Li

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, Siamese network-based visual tracking methods have gained popularity and success in terms of efficiency and accuracy. However, typical Siamese trackers utilize two independent weight-sharing streams to describe the exemplar and search region without any interaction between the two streams. As a result, such trackers employ only shallow cross-correlation or correlation filters to obtain the final information association, which neglects the deep interaction between the exemplar and search region and may reduce the discriminative power of the trackers. To address this issue, we propose a novel multi-parallel interactive transformer-based (MPIT) tracking framework to introduce sufficient interaction so that the two streams can guide the prediction heads to focus on the target more easily. Unlike recent one-stream transformer-based trackers that directly concatenate template and search tokens to perform joint feature learning, our multi-parallel interactive framework introduces a transmission band module to deliver global information for both the exemplar and the search region with low computational cost. Moreover, to integrate dynamic information, we incorporate temporal level extraction into the tracking framework to increase the variety of the templates. The experimental results show that the proposed MPIT method achieves a remarkable tracking speed of 136 frames per second (FPS) while attaining performance better than or comparable to that of state-of-the-art trackers.

Original languageEnglish
Article number0b00006493f017d9
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • interaction information
  • multi-parallel transformers
  • Siamese networks
  • Visual tracking

Fingerprint

Dive into the research topics of 'Siamese Visual Tracking with Multi-Parallel Interactive Transformers'. Together they form a unique fingerprint.

Cite this