Capturing Relevant Context for Visual Tracking

Yuping Zhang, Bo Ma*, Jiahao Wu, Lianghua Huang, Jianbing Shen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Studies have shown that contextual information can promote the robustness of trackers. However, trackers based on convolutional neural networks (CNNs) only capture local features, which limits their performance. We propose a novel relevant context block (RCB), which employs graph convolutional networks to capture the relevant context. In particular, it selects the k largest contributors as nodes for each query position (unit) that contain meaningful and discriminative contextual information and updates the nodes by aggregating the differences between the query position and its contributors. This operation can be easily incorporated into the existing networks and can be easily end-to-end trained using a standard backpropagation algorithm. To verify the effectiveness of RCB, we apply it to two trackers, SiamFC and GlobalTrack, respectively, and the two improved trackers are referred to as Siam-RCB and GlobalTrack-RCB. Extensive experiments on OTB, VOT, UAV123, LaSOT, TrackingNet, OxUvA, and VOT2018LT show the superiority of our method. For example, our Siam-RCB outperforms SiamFC by a very large margin (up to 11.2% in the success score and 7.8% in the precision score) on the OTB-100 benchmark.

Original languageEnglish
Pages (from-to)4232-4244
Number of pages13
JournalIEEE Transactions on Multimedia
Volume23
DOIs
Publication statusPublished - 2021

Keywords

  • Local neighborhood graph
  • long-range dependencies
  • long-term tracking
  • visual object tracking

Fingerprint

Dive into the research topics of 'Capturing Relevant Context for Visual Tracking'. Together they form a unique fingerprint.

Cite this