Abstract
Offline-trained Siamese networks have realized very promising tracking precision and efficiency. However, the performance is still limited by the drawbacks in online update. Traditional strategies cannot tackle the irregular variations of object and the sampling noise, so it is quite risky to adopt them to update Siamese trackers. In this paper, we present a two-stage one-shot learner by exploring the learning scheme of Siamese network, which reveals there are two key issues during online update, i.e., feature fusion and feature comparison. Based on this finding, we propose an updatable Siamese tracker by introducing two independent transformers (SiamTOL). Concretely, a Cross-aware transformer is designed to combine the features of the initial and the dynamic templates, while a Decoder-favored transformer is exploited to compare the fusing template and the search region. By combining these transformers, our tracker is able to adequately model the feature dependencies between multi-frame object samples. Extensive experimental results on several popular benchmarks well manifest that the proposed approach achieves the leading performance, and outperforms other state-of-the-art trackers.
Original language | English |
---|---|
Article number | 109965 |
Journal | Pattern Recognition |
Volume | 146 |
DOIs | |
Publication status | Published - Feb 2024 |
Keywords
- One-shot learning
- Online update
- Siamese network
- Transformer
- Visual tracking