Occlusion and Deformation Handling Visual Tracking for UAV via Attention-Based Mask Generative Network

Yashuo Bai, Yong Song, Yufei Zhao*, Ya Zhou, Xiyan Wu, Yuxin He, Zishuo Zhang, Xin Yang, Qun Hao

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

7 引用 (Scopus)

摘要

Although the performance of unmanned aerial vehicle (UAV) tracking has benefited from the successful application of discriminative correlation filters (DCF) and convolutional neural networks (CNNs), UAV tracking under occlusion and deformation remains a challenge. The main dilemma is that challenging scenes, such as occlusion or deformation, are very complex and changeable, making it difficult to obtain training data covering all situations, resulting in trained networks that may be confused by new contexts that differ from historical information. Data-driven strategies are the main direction of current solutions, but gathering large-scale datasets with object instances under various occlusion and deformation conditions is difficult and lacks diversity. This paper proposes an attention-based mask generation network (AMGN) for UAV-specific tracking, which combines the attention mechanism and adversarial learning to improve the tracker’s ability to handle occlusion and deformation. After the base CNN extracts the deep features of the candidate region, a series of masks are determined by the spatial attention module and sent to the generator, and the generator discards some features according to these masks to simulate the occlusion and deformation of the object, producing more hard positive samples. The discriminator seeks to distinguish these hard positive samples while guiding mask generation. Such adversarial learning can effectively complement occluded and deformable positive samples in the feature space, allowing to capture more robust features to distinguish objects from backgrounds. Comparative experiments show that our AMGN-based tracker achieves the highest area under curve (AUC) of 0.490 and 0.349, and the highest precision scores of 0.742 and 0.662, on the UAV123 tracking benchmark with partial and full occlusion attributes, respectively. It also achieves the highest AUC of 0.555 and the highest precision score of 0.797 on the DTB70 tracking benchmark with the deformation attribute. On the UAVDT tracking benchmark with the large occlusion attribute, it achieves the highest AUC of 0.407 and the highest precision score of 0.582.

源语言英语
文章编号4756
期刊Remote Sensing
14
19
DOI
出版状态已出版 - 10月 2022

指纹

探究 'Occlusion and Deformation Handling Visual Tracking for UAV via Attention-Based Mask Generative Network' 的科研主题。它们共同构成独一无二的指纹。

引用此