Capturing Relevant Context for Visual Tracking

Yuping Zhang; Bo Ma; Jiahao Wu; Lianghua Huang; Jianbing Shen

doi:10.1109/TMM.2020.3038310

Capturing Relevant Context for Visual Tracking

Yuping Zhang, Bo Ma^*, Jiahao Wu, Lianghua Huang, Jianbing Shen

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

9 引用（Scopus）

摘要

Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

源语言	英语
页（从-至）	4232-4244
页数	13
期刊	IEEE Transactions on Multimedia
卷	23
DOI	https://doi.org/10.1109/TMM.2020.3038310
出版状态	已出版 - 2021

访问文件

10.1109/TMM.2020.3038310

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, Y., Ma, B., Wu, J., Huang, L., & Shen, J. (2021). Capturing Relevant Context for Visual Tracking. IEEE Transactions on Multimedia, 23, 4232-4244. https://doi.org/10.1109/TMM.2020.3038310

@article{e20ca323ec96450cba38fb190789b80c,

title = "Capturing Relevant Context for Visual Tracking",

abstract = "Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.",

keywords = "Local neighborhood graph, long-range dependencies, long-term tracking, visual object tracking",

author = "Yuping Zhang and Bo Ma and Jiahao Wu and Lianghua Huang and Jianbing Shen",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2021",

doi = "10.1109/TMM.2020.3038310",

language = "English",

volume = "23",

pages = "4232--4244",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Capturing Relevant Context for Visual Tracking

AU - Zhang, Yuping

AU - Ma, Bo

AU - Wu, Jiahao

AU - Huang, Lianghua

AU - Shen, Jianbing

PY - 2021

Y1 - 2021

N2 - Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

AB - Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

KW - Local neighborhood graph

KW - long-range dependencies

KW - long-term tracking

KW - visual object tracking

UR - http://www.scopus.com/inward/record.url?scp=85097165544&partnerID=8YFLogxK

U2 - 10.1109/TMM.2020.3038310

DO - 10.1109/TMM.2020.3038310

M3 - Article

AN - SCOPUS:85097165544

SN - 1520-9210

VL - 23

SP - 4232

EP - 4244

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Capturing Relevant Context for Visual Tracking

摘要

访问文件

其它文件与链接

指纹

引用此