TY - GEN
T1 - DLA-MOT
T2 - 4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025
AU - Ding, Yan
AU - Zhao, Minjin
AU - Ling, Yuchen
AU - Rao, Zhizhen
AU - Song, Ping
AU - Cen, Yetao
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - To meet the stringent demands of robust and real-time multi-object tracking (MOT) for unmanned-aerial-vehicle (UAV) swarms operating in complex battlefields, we design DLA-MOT, a joint detection-and-tracking framework that synergistically integrates multi-task learning with deep-layer-aggregation. A tree-structured DLA network is designed to iteratively fuse cross-scale semantic and fine-grained features via dense skip connections, yielding a 2.1× gain in representation quality for objects under 32 × 32 pixels. Complementarily, a weighted multi-task loss simultaneously supervises heat-map regression, bounding-box refinement and identity discrimination, enabling balanced co-optimization of detection and re-identification objectives. On top of these, a three-level data-association cascade progressively resolves occlusion, mis-detection and identity recovery by fusing appearance-motion distance, IoU matching and trajectory-confidence gating. Extensive evaluation on the VisDrone-MOT benchmark shows that DLA-MOT attains 30.3 % mAP and 37.3 % mAR, surpassing YOLOX-M, YOLOv11-M and RT-DETR-L by 13.4 %, 13.2 % and 15.5 %, respectively. Tracking performance reaches 47.5 % IDF1 and 32.5 % MOTA while reducing false positives, false negatives and identity switches by 11.3 %, 10.6 % and 32.9 % against the baseline. These results corroborate the framework's robustness and real-time capability across multi-scale, multi-illumination UAV scenarios.
AB - To meet the stringent demands of robust and real-time multi-object tracking (MOT) for unmanned-aerial-vehicle (UAV) swarms operating in complex battlefields, we design DLA-MOT, a joint detection-and-tracking framework that synergistically integrates multi-task learning with deep-layer-aggregation. A tree-structured DLA network is designed to iteratively fuse cross-scale semantic and fine-grained features via dense skip connections, yielding a 2.1× gain in representation quality for objects under 32 × 32 pixels. Complementarily, a weighted multi-task loss simultaneously supervises heat-map regression, bounding-box refinement and identity discrimination, enabling balanced co-optimization of detection and re-identification objectives. On top of these, a three-level data-association cascade progressively resolves occlusion, mis-detection and identity recovery by fusing appearance-motion distance, IoU matching and trajectory-confidence gating. Extensive evaluation on the VisDrone-MOT benchmark shows that DLA-MOT attains 30.3 % mAP and 37.3 % mAR, surpassing YOLOX-M, YOLOv11-M and RT-DETR-L by 13.4 %, 13.2 % and 15.5 %, respectively. Tracking performance reaches 47.5 % IDF1 and 32.5 % MOTA while reducing false positives, false negatives and identity switches by 11.3 %, 10.6 % and 32.9 % against the baseline. These results corroborate the framework's robustness and real-time capability across multi-scale, multi-illumination UAV scenarios.
KW - DLA network
KW - multi-task learning
KW - small object detection
KW - UAV multi-object tracking
UR - https://www.scopus.com/pages/publications/105022240522
U2 - 10.1109/IHCIT66787.2025.11198993
DO - 10.1109/IHCIT66787.2025.11198993
M3 - Conference contribution
AN - SCOPUS:105022240522
T3 - 2025 4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025
SP - 60
EP - 71
BT - 2025 4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 August 2025 through 24 August 2025
ER -