Deep Siamese Cross-Residual Learning for Robust Visual Tracking

Fan Wu; Tingfa Xu; Jie Guo; Bo Huang; Chang Xu; Jihui Wang; Xiangmin Li

doi:10.1109/JIOT.2020.3041052

Deep Siamese Cross-Residual Learning for Robust Visual Tracking

Fan Wu, Tingfa Xu^*, Jie Guo, Bo Huang, Chang Xu, Jihui Wang, Xiangmin Li

^*Corresponding author for this work

School of Optics and Photonics

Research output: Contribution to journal › Article › peer-review

4 Citations (Scopus)

Abstract

The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.

Original language	English
Pages (from-to)	15216-15227
Number of pages	12
Journal	IEEE Internet of Things Journal
Volume	8
Issue number	20
DOIs	https://doi.org/10.1109/JIOT.2020.3041052
Publication status	Published - 15 Oct 2021

Keywords

Convolutional neural network (CNN)
Internet of Things (IoT)
Siamese cross-residual learning
deep learning
visual tracking

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/JIOT.2020.3041052

Cite this

Wu, F., Xu, T., Guo, J., Huang, B., Xu, C., Wang, J., & Li, X. (2021). Deep Siamese Cross-Residual Learning for Robust Visual Tracking. IEEE Internet of Things Journal, 8(20), 15216-15227. https://doi.org/10.1109/JIOT.2020.3041052

@article{e85f9fb3353f4eaea01c889fb56abcae,

title = "Deep Siamese Cross-Residual Learning for Robust Visual Tracking",

abstract = "The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.",

keywords = "Convolutional neural network (CNN), Internet of Things (IoT), Siamese cross-residual learning, deep learning, visual tracking",

author = "Fan Wu and Tingfa Xu and Jie Guo and Bo Huang and Chang Xu and Jihui Wang and Xiangmin Li",

note = "Publisher Copyright: {\textcopyright} 2014 IEEE.",

year = "2021",

month = oct,

day = "15",

doi = "10.1109/JIOT.2020.3041052",

language = "English",

volume = "8",

pages = "15216--15227",

journal = "IEEE Internet of Things Journal",

issn = "2327-4662",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "20",

}

TY - JOUR

T1 - Deep Siamese Cross-Residual Learning for Robust Visual Tracking

AU - Wu, Fan

AU - Xu, Tingfa

AU - Guo, Jie

AU - Huang, Bo

AU - Xu, Chang

AU - Wang, Jihui

AU - Li, Xiangmin

PY - 2021/10/15

Y1 - 2021/10/15

N2 - The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.

AB - The sixth-generation (6G) wireless technology contributes to the establishment of the Internet of Things (IoT). Recently, the IoT has become popular because of its smart architectures and various applications. Among these applications, intelligent urban surveillance systems for smart cities are becoming more and more important. Therefore, designing a robust visual tracking method has become an urgent task. Deep Siamese convolutional neural networks have been applied to visual tracking recently because of their advantageous abilities to learn a matching function between the template and the target candidate. Unlike traditional Siamese networks, which separately treat the two branches, we propose deep Siamese cross-residual learning to entangle the two branches from the beginning to the end of the Siamese network. This strategy can make the two branches exchange instance-specific information at different nodes of the network and learn a more compact representation of the target. In addition, we propose a combined loss function, which consists of two complementary tasks. One task is to learn a matching function directly and the other one is to learn a classification function. Moreover, our model does not need to load any pretrained weights and is trained with limited sequences from scratch. Plenty of experiments show that our tracker performs favorably against many state-of-the-art tracking methods.

KW - Convolutional neural network (CNN)

KW - Internet of Things (IoT)

KW - Siamese cross-residual learning

KW - deep learning

KW - visual tracking

UR - http://www.scopus.com/inward/record.url?scp=85098774113&partnerID=8YFLogxK

U2 - 10.1109/JIOT.2020.3041052

DO - 10.1109/JIOT.2020.3041052

M3 - Article

AN - SCOPUS:85098774113

SN - 2327-4662

VL - 8

SP - 15216

EP - 15227

JO - IEEE Internet of Things Journal

JF - IEEE Internet of Things Journal

IS - 20

ER -

Deep Siamese Cross-Residual Learning for Robust Visual Tracking

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this