摘要
The exploitation of unlabeled videos for visual object tracking has recently drawn increasing attention. However, unreliable pseudo-labels cause an incomplete appearance of the object and an incorrect search region, which hinders bounding box regression learning. To address this issue, we propose a novel semi-supervised learning method, termed pseudo-labeling and multi-frame consistency training (PL-MCT), for visual tracking, which successfully improves the reliability of pseudo-labels and the robustness of the tracker. Specifically, we introduce a pseudo-label evaluation (PLE) module to provide the reliability score of the pseudo-label and design a prediction-training alternation (PTA) strategy to effectively mitigate the bias of noisy pseudo-labels, which contributes to selecting high-quality pseudo-labels as training pairs. Meanwhile, to cope with the appearance variations of objects in complex scenarios, we employ a multi-frame consistency training scheme that introduced an online update head (OUH) to continue training the tracker to learn the signal in the temporal dimension of videos and update online. Extensive experiments demonstrate the effectiveness of the proposed method. Our method (PL-MCT) achieves a precision score of 0.856 on OTB2015 and 0.408 on LaSOT, which achieves advanced performance compared to other unsupervised methods and has comparable results to preceding supervised methods. Project will be available at https://github.com/HYQ-hyq222/PL-MCT.
源语言 | 英语 |
---|---|
期刊 | Visual Computer |
DOI | |
出版状态 | 已接受/待刊 - 2024 |