PL-MCT: pseudo-labeling and multi-frame consistency training for semi-supervised visual tracking

Yiqian Huang, Shuqi Liu, Fei Dong, Xu Li, Xin Yang, Ya Zhou, Jinxiang Huang, Yong Song*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

The exploitation of unlabeled videos for visual object tracking has recently drawn increasing attention. However, unreliable pseudo-labels cause an incomplete appearance of the object and an incorrect search region, which hinders bounding box regression learning. To address this issue, we propose a novel semi-supervised learning method, termed pseudo-labeling and multi-frame consistency training (PL-MCT), for visual tracking, which successfully improves the reliability of pseudo-labels and the robustness of the tracker. Specifically, we introduce a pseudo-label evaluation (PLE) module to provide the reliability score of the pseudo-label and design a prediction-training alternation (PTA) strategy to effectively mitigate the bias of noisy pseudo-labels, which contributes to selecting high-quality pseudo-labels as training pairs. Meanwhile, to cope with the appearance variations of objects in complex scenarios, we employ a multi-frame consistency training scheme that introduced an online update head (OUH) to continue training the tracker to learn the signal in the temporal dimension of videos and update online. Extensive experiments demonstrate the effectiveness of the proposed method. Our method (PL-MCT) achieves a precision score of 0.856 on OTB2015 and 0.408 on LaSOT, which achieves advanced performance compared to other unsupervised methods and has comparable results to preceding supervised methods. Project will be available at https://github.com/HYQ-hyq222/PL-MCT.

Original languageEnglish
JournalVisual Computer
DOIs
Publication statusAccepted/In press - 2024

Keywords

  • Consistency training
  • Pseudo-label
  • Semi-supervised
  • Visual tracking

Fingerprint

Dive into the research topics of 'PL-MCT: pseudo-labeling and multi-frame consistency training for semi-supervised visual tracking'. Together they form a unique fingerprint.

Cite this

Huang, Y., Liu, S., Dong, F., Li, X., Yang, X., Zhou, Y., Huang, J., & Song, Y. (Accepted/In press). PL-MCT: pseudo-labeling and multi-frame consistency training for semi-supervised visual tracking. Visual Computer. https://doi.org/10.1007/s00371-024-03651-5