TY - GEN
T1 - Full Attention Tracker
T2 - 42nd Chinese Control Conference, CCC 2023
AU - Wang, Yuxuan
AU - Yan, Liping
AU - Feng, Zihang
AU - Xia, Yuanqing
AU - Xiao, Bo
N1 - Publisher Copyright:
© 2023 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2023
Y1 - 2023
N2 - The tracker based on Siamese neural network is currently a technical method with high accuracy in the tracking field. With the introduction of transformer in the visual tracking field, the attention mechanism has gradually emerged in tracking tasks. However, due to the characteristics of attention operation, Transformer usually has slow convergence speed, and its pixel-level correlation discrimination in tracking is more likely to lead to overfitting, which is not conducive to long-term tracking. A brand new framework FAT was designed, which is the improvement of MixFormer. The operation for simultaneous feature extraction and target information integration in MixFormer is retained, and the Mixing block is introduced to suppress the background as much as possible before the information interaction. In addition, a new operation is designed: the result of region-level cross-correlation is used as a guidance to help the learning of pixel-level cross-correlation in attention, thereby accelerating the model convergence speed and enhancing the model generalization. Finally, a joint loss function is designed to further improve the accuracy of the model. Experiments show that the presented tracker achieves excellent performance on five benchmark datasets.
AB - The tracker based on Siamese neural network is currently a technical method with high accuracy in the tracking field. With the introduction of transformer in the visual tracking field, the attention mechanism has gradually emerged in tracking tasks. However, due to the characteristics of attention operation, Transformer usually has slow convergence speed, and its pixel-level correlation discrimination in tracking is more likely to lead to overfitting, which is not conducive to long-term tracking. A brand new framework FAT was designed, which is the improvement of MixFormer. The operation for simultaneous feature extraction and target information integration in MixFormer is retained, and the Mixing block is introduced to suppress the background as much as possible before the information interaction. In addition, a new operation is designed: the result of region-level cross-correlation is used as a guidance to help the learning of pixel-level cross-correlation in attention, thereby accelerating the model convergence speed and enhancing the model generalization. Finally, a joint loss function is designed to further improve the accuracy of the model. Experiments show that the presented tracker achieves excellent performance on five benchmark datasets.
KW - Attention
KW - Correlation
KW - Transformer
KW - Visual Tracking
UR - http://www.scopus.com/inward/record.url?scp=85175577669&partnerID=8YFLogxK
U2 - 10.23919/CCC58697.2023.10240179
DO - 10.23919/CCC58697.2023.10240179
M3 - Conference contribution
AN - SCOPUS:85175577669
T3 - Chinese Control Conference, CCC
SP - 7440
EP - 7446
BT - 2023 42nd Chinese Control Conference, CCC 2023
PB - IEEE Computer Society
Y2 - 24 July 2023 through 26 July 2023
ER -