Capturing Relevant Context for Visual Tracking

Yuping Zhang; Bo Ma; Jiahao Wu; Lianghua Huang; Jianbing Shen

doi:10.1109/TMM.2020.3038310

Capturing Relevant Context for Visual Tracking

Yuping Zhang, Bo Ma^*, Jiahao Wu, Lianghua Huang, Jianbing Shen

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

Abstract

Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

Original language	English
Pages (from-to)	4232-4244
Number of pages	13
Journal	IEEE Transactions on Multimedia
Volume	23
DOIs	https://doi.org/10.1109/TMM.2020.3038310
Publication status	Published - 2021

Keywords

Local neighborhood graph
long-range dependencies
long-term tracking
visual object tracking

Access to Document

10.1109/TMM.2020.3038310

Cite this

@article{e20ca323ec96450cba38fb190789b80c,

title = "Capturing Relevant Context for Visual Tracking",

abstract = "Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.",

keywords = "Local neighborhood graph, long-range dependencies, long-term tracking, visual object tracking",

author = "Yuping Zhang and Bo Ma and Jiahao Wu and Lianghua Huang and Jianbing Shen",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2021",

doi = "10.1109/TMM.2020.3038310",

language = "English",

volume = "23",

pages = "4232--4244",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Capturing Relevant Context for Visual Tracking

AU - Zhang, Yuping

AU - Ma, Bo

AU - Wu, Jiahao

AU - Huang, Lianghua

AU - Shen, Jianbing

PY - 2021

Y1 - 2021

N2 - Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

AB - Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

KW - Local neighborhood graph

KW - long-range dependencies

KW - long-term tracking

KW - visual object tracking

UR - http://www.scopus.com/inward/record.url?scp=85097165544&partnerID=8YFLogxK

U2 - 10.1109/TMM.2020.3038310

DO - 10.1109/TMM.2020.3038310

M3 - Article

AN - SCOPUS:85097165544

SN - 1520-9210

VL - 23

SP - 4232

EP - 4244

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Capturing Relevant Context for Visual Tracking

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this