TY - JOUR
T1 - Vision-based swarm tracking of multiple UAVs in air-to-air scenarios
AU - CHU, Zhaochen
AU - SONG, Tao
AU - JIN, Ren
AU - LIN, Defu
AU - SHEN, Hao
AU - LYU, Maolong
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/12
Y1 - 2025/12
N2 - Vision-based air-to-air tracking of multi-UAV swarms is crucial for effective swarm perception. UAVs in a swarm typically share similar appearance features and exhibit nonlinear motion, as they are usually of the same type. This homogeneity poses challenges for the existing multi-object tracking (MOT) algorithms, which often suffer performance degradation due to the difficulties in capturing instance-specific appearance and motion cues. In this paper, we propose a novel multi-frame pose-attention-based appearance feature extraction component that captures instance-level pose features of UAVs across consecutive frames. Additionally, we introduce a motion difference accumulation strategy to extract spatial and motion cues from multiple adjacent frames. By combining these techniques, we design a multi-frame association framework that effectively distinguishes between similar UAVs in a swarm by leveraging object features over consecutive frames. To address the lack of relevant datasets, we create the AIRMOT dataset, specifically tailored for air-to-air tracking of homogeneous UAV swarms. Our method is evaluated on the AIRMOT dataset as well as the publicly available MOT-FLY and UAVSwarm datasets. The experimental results demonstrate that our approach outperforms other state-of-the-art (SOTA) methods, delivering superior tracking performance.
AB - Vision-based air-to-air tracking of multi-UAV swarms is crucial for effective swarm perception. UAVs in a swarm typically share similar appearance features and exhibit nonlinear motion, as they are usually of the same type. This homogeneity poses challenges for the existing multi-object tracking (MOT) algorithms, which often suffer performance degradation due to the difficulties in capturing instance-specific appearance and motion cues. In this paper, we propose a novel multi-frame pose-attention-based appearance feature extraction component that captures instance-level pose features of UAVs across consecutive frames. Additionally, we introduce a motion difference accumulation strategy to extract spatial and motion cues from multiple adjacent frames. By combining these techniques, we design a multi-frame association framework that effectively distinguishes between similar UAVs in a swarm by leveraging object features over consecutive frames. To address the lack of relevant datasets, we create the AIRMOT dataset, specifically tailored for air-to-air tracking of homogeneous UAV swarms. Our method is evaluated on the AIRMOT dataset as well as the publicly available MOT-FLY and UAVSwarm datasets. The experimental results demonstrate that our approach outperforms other state-of-the-art (SOTA) methods, delivering superior tracking performance.
KW - Convolutional neural networks
KW - Feature extraction
KW - Swarm intelligence
KW - Target tracking
KW - Unmanned aerial vehicles
UR - https://www.scopus.com/pages/publications/105019185184
U2 - 10.1016/j.cja.2025.103558
DO - 10.1016/j.cja.2025.103558
M3 - Article
AN - SCOPUS:105019185184
SN - 1000-9361
VL - 38
JO - Chinese Journal of Aeronautics
JF - Chinese Journal of Aeronautics
IS - 12
M1 - 103558
ER -