TY - JOUR
T1 - Deep Siamese Cross-Residual Learning for Robust Visual Tracking
AU - Wu, Fan
AU - Xu, Tingfa
AU - Guo, Jie
AU - Huang, Bo
AU - Xu, Chang
AU - Wang, Jihui
AU - Li, Xiangmin
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2021/10/15
Y1 - 2021/10/15
N2 - The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.
AB - The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.
KW - Convolutional neural network (CNN)
KW - Internet of Things (IoT)
KW - Siamese cross-residual learning
KW - deep learning
KW - visual tracking
UR - http://www.scopus.com/inward/record.url?scp=85098774113&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2020.3041052
DO - 10.1109/JIOT.2020.3041052
M3 - Article
AN - SCOPUS:85098774113
SN - 2327-4662
VL - 8
SP - 15216
EP - 15227
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 20
ER -