Updatable Siamese tracker with two-stage one-shot learning

Xinglong Sun; Haijiang Sun; Jianan Li

doi:10.1016/j.patcog.2023.109965

Updatable Siamese tracker with two-stage one-shot learning

Xinglong Sun, Haijiang Sun^*, Jianan Li

^*Corresponding author for this work

School of Optics and Photonics

CAS - Changchun Institute of Optics Fine Mechanics and Physics

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

Offline-trained Siamese networks have realized very promising tracking precision and efficiency. However, the performance is still limited by the drawbacks in online update. Traditional strategies cannot tackle the irregular variations of object and the sampling noise, so it is quite risky to adopt them to update Siamese trackers. In this paper, we present a two-stage one-shot learner by exploring the learning scheme of Siamese network, which reveals there are two key issues during online update, i.e., feature fusion and feature comparison. Based on this finding, we propose an updatable Siamese tracker by introducing two independent transformers (SiamTOL). Concretely, a Cross-aware transformer is designed to combine the features of the initial and the dynamic templates, while a Decoder-favored transformer is exploited to compare the fusing template and the search region. By combining these transformers, our tracker is able to adequately model the feature dependencies between multi-frame object samples. Extensive experimental results on several popular benchmarks well manifest that the proposed approach achieves the leading performance, and outperforms other state-of-the-art trackers.

Original language	English
Article number	109965
Journal	Pattern Recognition
Volume	146
DOIs	https://doi.org/10.1016/j.patcog.2023.109965
Publication status	Published - Feb 2024

Keywords

One-shot learning
Online update
Siamese network
Transformer
Visual tracking

Access to Document

10.1016/j.patcog.2023.109965

Cite this

Sun, X., Sun, H., & Li, J. (2024). Updatable Siamese tracker with two-stage one-shot learning. Pattern Recognition, 146, Article 109965. https://doi.org/10.1016/j.patcog.2023.109965

@article{7e73e78cf3144e5cba55327020a2e28c,

title = "Updatable Siamese tracker with two-stage one-shot learning",

abstract = "Offline-trained Siamese networks have realized very promising tracking precision and efficiency. However, the performance is still limited by the drawbacks in online update. Traditional strategies cannot tackle the irregular variations of object and the sampling noise, so it is quite risky to adopt them to update Siamese trackers. In this paper, we present a two-stage one-shot learner by exploring the learning scheme of Siamese network, which reveals there are two key issues during online update, i.e., feature fusion and feature comparison. Based on this finding, we propose an updatable Siamese tracker by introducing two independent transformers (SiamTOL). Concretely, a Cross-aware transformer is designed to combine the features of the initial and the dynamic templates, while a Decoder-favored transformer is exploited to compare the fusing template and the search region. By combining these transformers, our tracker is able to adequately model the feature dependencies between multi-frame object samples. Extensive experimental results on several popular benchmarks well manifest that the proposed approach achieves the leading performance, and outperforms other state-of-the-art trackers.",

keywords = "One-shot learning, Online update, Siamese network, Transformer, Visual tracking",

author = "Xinglong Sun and Haijiang Sun and Jianan Li",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier Ltd",

year = "2024",

month = feb,

doi = "10.1016/j.patcog.2023.109965",

language = "English",

volume = "146",

journal = "Pattern Recognition",

issn = "0031-3203",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Updatable Siamese tracker with two-stage one-shot learning

AU - Sun, Xinglong

AU - Sun, Haijiang

AU - Li, Jianan

PY - 2024/2

Y1 - 2024/2

N2 - Offline-trained Siamese networks have realized very promising tracking precision and efficiency. However, the performance is still limited by the drawbacks in online update. Traditional strategies cannot tackle the irregular variations of object and the sampling noise, so it is quite risky to adopt them to update Siamese trackers. In this paper, we present a two-stage one-shot learner by exploring the learning scheme of Siamese network, which reveals there are two key issues during online update, i.e., feature fusion and feature comparison. Based on this finding, we propose an updatable Siamese tracker by introducing two independent transformers (SiamTOL). Concretely, a Cross-aware transformer is designed to combine the features of the initial and the dynamic templates, while a Decoder-favored transformer is exploited to compare the fusing template and the search region. By combining these transformers, our tracker is able to adequately model the feature dependencies between multi-frame object samples. Extensive experimental results on several popular benchmarks well manifest that the proposed approach achieves the leading performance, and outperforms other state-of-the-art trackers.

AB - Offline-trained Siamese networks have realized very promising tracking precision and efficiency. However, the performance is still limited by the drawbacks in online update. Traditional strategies cannot tackle the irregular variations of object and the sampling noise, so it is quite risky to adopt them to update Siamese trackers. In this paper, we present a two-stage one-shot learner by exploring the learning scheme of Siamese network, which reveals there are two key issues during online update, i.e., feature fusion and feature comparison. Based on this finding, we propose an updatable Siamese tracker by introducing two independent transformers (SiamTOL). Concretely, a Cross-aware transformer is designed to combine the features of the initial and the dynamic templates, while a Decoder-favored transformer is exploited to compare the fusing template and the search region. By combining these transformers, our tracker is able to adequately model the feature dependencies between multi-frame object samples. Extensive experimental results on several popular benchmarks well manifest that the proposed approach achieves the leading performance, and outperforms other state-of-the-art trackers.

KW - One-shot learning

KW - Online update

KW - Siamese network

KW - Transformer

KW - Visual tracking

UR - http://www.scopus.com/inward/record.url?scp=85172924573&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2023.109965

DO - 10.1016/j.patcog.2023.109965

M3 - Article

AN - SCOPUS:85172924573

SN - 0031-3203

VL - 146

JO - Pattern Recognition

JF - Pattern Recognition

M1 - 109965

ER -

Updatable Siamese tracker with two-stage one-shot learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this