TY - JOUR
T1 - Metric learning based structural appearance model for robust visual tracking
AU - Wu, Yuwei
AU - Ma, Bo
AU - Yang, Min
AU - Zhang, Jian
AU - Jia, Yunde
PY - 2014/5
Y1 - 2014/5
N2 - Appearance modeling is a key issue for the success of a visual tracker. Sparse representation based appearance modeling has received an increasing amount of interest in recent years. However, most of existing work utilizes reconstruction errors to compute the observation likelihood under the generative framework, which may give poor performance, especially for significant appearance variations. In this paper, we advocate an approach to visual tracking that seeks an appropriate metric in the feature space of sparse codes and propose a metric learning based structural appearance model for more accurate matching of different appearances. This structural representation is acquired by performing multiscale max pooling on the weighted local sparse codes of image patches. An online multiple instance metric learning algorithm is proposed that learns a discriminative and adaptive metric, thereby better distinguishing the visual object of interest from the background. The multiple instance setting is able to alleviate the drift problem potentially caused by misaligned training examples. Tracking is then carried out within a Bayesian inference framework, in which the learned metric and the structure object representation are used to construct the observation model. Comprehensive experiments on challenging image sequences demonstrate qualitatively and quantitatively that the proposed algorithm outperforms the state-of-the-art methods.
AB - Appearance modeling is a key issue for the success of a visual tracker. Sparse representation based appearance modeling has received an increasing amount of interest in recent years. However, most of existing work utilizes reconstruction errors to compute the observation likelihood under the generative framework, which may give poor performance, especially for significant appearance variations. In this paper, we advocate an approach to visual tracking that seeks an appropriate metric in the feature space of sparse codes and propose a metric learning based structural appearance model for more accurate matching of different appearances. This structural representation is acquired by performing multiscale max pooling on the weighted local sparse codes of image patches. An online multiple instance metric learning algorithm is proposed that learns a discriminative and adaptive metric, thereby better distinguishing the visual object of interest from the background. The multiple instance setting is able to alleviate the drift problem potentially caused by misaligned training examples. Tracking is then carried out within a Bayesian inference framework, in which the learned metric and the structure object representation are used to construct the observation model. Comprehensive experiments on challenging image sequences demonstrate qualitatively and quantitatively that the proposed algorithm outperforms the state-of-the-art methods.
KW - Appearance modeling
KW - multiple instance metric learning
KW - multiscale max pooling
KW - object tracking
KW - sparse coding
UR - http://www.scopus.com/inward/record.url?scp=84900529184&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2013.2291283
DO - 10.1109/TCSVT.2013.2291283
M3 - Article
AN - SCOPUS:84900529184
SN - 1051-8215
VL - 24
SP - 865
EP - 877
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 5
M1 - 6665059
ER -