Structural-appearance information fusion for visual tracking

Yuping Zhang; Zepeng Yang; Bo Ma; Jiahao Wu; Fusheng Jin

doi:10.1007/s00371-023-03013-7

Structural-appearance information fusion for visual tracking

Yuping Zhang, Zepeng Yang, Bo Ma^*, Jiahao Wu, Fusheng Jin

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

3 引用（Scopus）

摘要

In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.

源语言	英语
页（从-至）	3103-3117
页数	15
期刊	Visual Computer
卷	40
期	5
DOI	https://doi.org/10.1007/s00371-023-03013-7
出版状态	已出版 - 5月 2024

访问文件

10.1007/s00371-023-03013-7

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, Y., Yang, Z., Ma, B., Wu, J., & Jin, F. (2024). Structural-appearance information fusion for visual tracking. Visual Computer, 40(5), 3103-3117. https://doi.org/10.1007/s00371-023-03013-7

@article{d9d31d5694554de79a1557d5b3a5afe6,

title = "Structural-appearance information fusion for visual tracking",

abstract = "In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.",

keywords = "Multi-information fusion, Siamese networks, Visual tracking",

author = "Yuping Zhang and Zepeng Yang and Bo Ma and Jiahao Wu and Fusheng Jin",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023.",

year = "2024",

month = may,

doi = "10.1007/s00371-023-03013-7",

language = "English",

volume = "40",

pages = "3103--3117",

journal = "Visual Computer",

issn = "0178-2789",

publisher = "Springer Verlag",

number = "5",

}

TY - JOUR

T1 - Structural-appearance information fusion for visual tracking

AU - Zhang, Yuping

AU - Yang, Zepeng

AU - Ma, Bo

AU - Wu, Jiahao

AU - Jin, Fusheng

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023.

PY - 2024/5

Y1 - 2024/5

N2 - In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.

AB - In this work, we propose a visual tracking algorithm based on structural-appearance information fusion that aims to distinguish the target from distractors, including both semantical and visual distractors. It measures the similarity of targets using both appearance information and structural information, with the former extracted from siamese networks and the latter learned from appearance information using a target-cross attention mechanism. The structural and appearance information can be dynamically fused by using a gating recurrent unit, which can control the fusion ratio between them.Additionally, we introduce a similarity matching loss function to explicitly guide feature extraction. Our proposed method can extract discriminative features that facilitate the identification of the target, thus improving tracking performance. Extensive experimental results show that our proposed similarity feature extraction method can improve the tracking performance.

KW - Multi-information fusion

KW - Siamese networks

KW - Visual tracking

UR - http://www.scopus.com/inward/record.url?scp=85168441327&partnerID=8YFLogxK

U2 - 10.1007/s00371-023-03013-7

DO - 10.1007/s00371-023-03013-7

M3 - Article

AN - SCOPUS:85168441327

SN - 0178-2789

VL - 40

SP - 3103

EP - 3117

JO - Visual Computer

JF - Visual Computer

IS - 5

ER -

Structural-appearance information fusion for visual tracking

摘要

访问文件

其它文件与链接

指纹

引用此