TY - GEN
T1 - DFVNet
T2 - 9th International Conference on Vision, Image and Signal Processing, ICVISP 2025
AU - Lv, Junhe
AU - Chen, Linwei
AU - Fu, Ying
AU - Yin, Jun
AU - Wang, Yayun
AU - Yan, Chenggang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Detecting small objects is a critical task in various application domains, including autonomous driving and drone recognition. However, existing small object detection methods, which are predominantly based on single images, often fail to capture essential information due to issues such as density and motion blur. These methods also neglect the temporal context inherent in real-world scenarios, resulting in suboptimal detection performance. To address these limitations, this paper proposes a novel video-based Detection Transformer Network, named DFVNet, for small object detection. DFVNet leverages density features to extract information from densely packed s and enhances the current detection frame through a coarse-to-fine refinement process, utilizing reference features from adjacent temporal frames. Extensive experiments on the VisDrone2019-VID dataset show that DFVNet improves average precision AP50 by 3.0% compared to image-based detection methods and by 4.1% over other video-based detection methods.
AB - Detecting small objects is a critical task in various application domains, including autonomous driving and drone recognition. However, existing small object detection methods, which are predominantly based on single images, often fail to capture essential information due to issues such as density and motion blur. These methods also neglect the temporal context inherent in real-world scenarios, resulting in suboptimal detection performance. To address these limitations, this paper proposes a novel video-based Detection Transformer Network, named DFVNet, for small object detection. DFVNet leverages density features to extract information from densely packed s and enhances the current detection frame through a coarse-to-fine refinement process, utilizing reference features from adjacent temporal frames. Extensive experiments on the VisDrone2019-VID dataset show that DFVNet improves average precision AP50 by 3.0% compared to image-based detection methods and by 4.1% over other video-based detection methods.
KW - Density Feature
KW - Dynamic Query
KW - Multi-Frame Information Fusion
KW - Small Object Detection in Video
UR - https://www.scopus.com/pages/publications/105036393732
U2 - 10.1109/ICVISP68610.2025.11451735
DO - 10.1109/ICVISP68610.2025.11451735
M3 - Conference contribution
AN - SCOPUS:105036393732
T3 - ICVISP 2025 Proceedings - 2025 9th International Conference on Vision, Image and Signal Processing
BT - ICVISP 2025 Proceedings - 2025 9th International Conference on Vision, Image and Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 November 2025 through 30 November 2025
ER -