Abstract
In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.
Original language | English |
---|---|
Pages (from-to) | 3103-3117 |
Number of pages | 15 |
Journal | Visual Computer |
Volume | 40 |
Issue number | 5 |
DOIs | |
Publication status | Published - May 2024 |
Keywords
- Multi-information fusion
- Siamese networks
- Visual tracking