DLA-MOT: A Deep Layer Aggregation Framework for Robust Multi-Object Tracking in UAV Image

  • Yan Ding*
  • , Minjin Zhao
  • , Yuchen Ling
  • , Zhizhen Rao
  • , Ping Song
  • , Yetao Cen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

To meet the stringent demands of robust and real-time multi-object tracking (MOT) for unmanned-aerial-vehicle (UAV) swarms operating in complex battlefields, we design DLA-MOT, a joint detection-and-tracking framework that synergistically integrates multi-task learning with deep-layer-aggregation. A tree-structured DLA network is designed to iteratively fuse cross-scale semantic and fine-grained features via dense skip connections, yielding a 2.1× gain in representation quality for objects under 32 × 32 pixels. Complementarily, a weighted multi-task loss simultaneously supervises heat-map regression, bounding-box refinement and identity discrimination, enabling balanced co-optimization of detection and re-identification objectives. On top of these, a three-level data-association cascade progressively resolves occlusion, mis-detection and identity recovery by fusing appearance-motion distance, IoU matching and trajectory-confidence gating. Extensive evaluation on the VisDrone-MOT benchmark shows that DLA-MOT attains 30.3 % mAP and 37.3 % mAR, surpassing YOLOX-M, YOLOv11-M and RT-DETR-L by 13.4 %, 13.2 % and 15.5 %, respectively. Tracking performance reaches 47.5 % IDF1 and 32.5 % MOTA while reducing false positives, false negatives and identity switches by 11.3 %, 10.6 % and 32.9 % against the baseline. These results corroborate the framework's robustness and real-time capability across multi-scale, multi-illumination UAV scenarios.

Original languageEnglish
Title of host publication2025 4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages60-71
Number of pages12
ISBN (Electronic)9781665457798
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025 - Beijing, China
Duration: 22 Aug 202524 Aug 2025

Publication series

Name2025 4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025

Conference

Conference4th International Conference on Intelligent Mechanical and Human-Computer Interaction Technology, IHCIT 2025
Country/TerritoryChina
CityBeijing
Period22/08/2524/08/25

Keywords

  • DLA network
  • multi-task learning
  • small object detection
  • UAV multi-object tracking

Fingerprint

Dive into the research topics of 'DLA-MOT: A Deep Layer Aggregation Framework for Robust Multi-Object Tracking in UAV Image'. Together they form a unique fingerprint.

Cite this