Abstract
In this letter, we cast visual tracking as a template matching problem in a Siamese deep convolutional neural network architecture. In contrast to traditional or other deep feature-based tracking methods, the proposed model exploits multilevel convolutional features from a partial view. The model matches candidate patch and template patch from the feature dimension of convolutional features, leading to hundreds of thousands of base matchers. The base matchers from low-level convolutional features have small receptive fields which contain partial details of targets while the base matchers from high-level convolutional features have big receptive fields which capture semantic information of targets. The model achieves the final strong matcher as a weighted ensemble of all the base matchers. We design an effective weights propagation strategy to update the weights of base matchers. Moreover, we propose to use Cosine as the distance metric and a customized squared-loss function as cost function for robust. Experiments show that our tracker outperforms the state-of-the-art trackers in a wide range of tracking scenarios.
Original language | English |
---|---|
Article number | 8026140 |
Pages (from-to) | 1562-1566 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 24 |
Issue number | 10 |
DOIs | |
Publication status | Published - Oct 2017 |
Keywords
- Convolutional neural network (CNN)
- Siamese neural network
- ensemble tracking
- template matching