An Improved Siamese Tracking Network Based On Self-Attention And Cross-Attention

Yijun Lai, Jianmei Song, Haoping She*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Deep Siamese visual tracking network SiamRPN++ is found that its success rate and robustness is unsatisfactory when meeting complex scenes such as occlusion, large deformation, interference of similar objects and long-time tracking. Refer to these, we propose an improvement strategy based on self-attention and cross-attention mechanism. For backbone, we use Channel and Space self-attention modules, and we using different cross channel attention modules between template features and search features in every three RPN modules, finally using special self-attention on similarity feature maps. These tricks effectively suppress interference, improve the features' quality and make progress in robustness. Comparing with original SiamRPN++ with parameters from official open-source frame, PySOT, our network improves robustness of 3% on VOT2018, accuracy of 2% and success rate of 3% on OTB100.

Original languageEnglish
Title of host publicationProceedings of the 35th Chinese Control and Decision Conference, CCDC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages466-470
Number of pages5
ISBN (Electronic)9798350334722
DOIs
Publication statusPublished - 2023
Event35th Chinese Control and Decision Conference, CCDC 2023 - Yichang, China
Duration: 20 May 202322 May 2023

Publication series

NameProceedings of the 35th Chinese Control and Decision Conference, CCDC 2023

Conference

Conference35th Chinese Control and Decision Conference, CCDC 2023
Country/TerritoryChina
CityYichang
Period20/05/2322/05/23

Keywords

  • Siamese network
  • cross-attention
  • object tracking
  • self-attention

Fingerprint

Dive into the research topics of 'An Improved Siamese Tracking Network Based On Self-Attention And Cross-Attention'. Together they form a unique fingerprint.

Cite this