TY - GEN
T1 - HRDNET
T2 - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
AU - Liu, Ziming
AU - Gao, Guangyu
AU - Sun, Lin
AU - Fang, Zhiyuan
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Small object detection is a very challenging yet practical vision task. With deep network-based methods, the contextual information of small objects may disappear when the network goes deeper. An intuitive solution to alleviate this issue is to increase the input resolution, however, it will aggravate the large variant of object scale and introduce unbearable computation cost. To leverage the benefits of high-resolution images without bringing up new problems, we propose a High-Resolution Detection Network (HRDNet) which takes multiple resolution inputs with multi-depth backbones. Meanwhile, we propose the Multi-Depth Image Pyramid Network (MD-IPN) and Multi-Scale Feature Pyramid Network (MS-FPN). The MD-IPN maintains multiple position information using multiple depth backbones. Specifically, high-resolution input will be fed into a shallow network to reserve more positional information and reduce computational costs, while low-resolution input will be fed into a deep network to extract more semantics. By extracting various features from high to low resolutions, the MD-IPN can improve the performance of small object detection and maintain the performance of middle and large objects. Additionally, MS-FPN is introduced to align and fuse multi-scale feature groups generated by MD-IPN to reduce the information imbalance. Extensive experiments are conducted on the COCO2017 and the typical small object dataset, VisDrone 2019. Notably, our HRDNet achieves the state-of-the-art on these two datasets with significant improvements on small objects.
AB - Small object detection is a very challenging yet practical vision task. With deep network-based methods, the contextual information of small objects may disappear when the network goes deeper. An intuitive solution to alleviate this issue is to increase the input resolution, however, it will aggravate the large variant of object scale and introduce unbearable computation cost. To leverage the benefits of high-resolution images without bringing up new problems, we propose a High-Resolution Detection Network (HRDNet) which takes multiple resolution inputs with multi-depth backbones. Meanwhile, we propose the Multi-Depth Image Pyramid Network (MD-IPN) and Multi-Scale Feature Pyramid Network (MS-FPN). The MD-IPN maintains multiple position information using multiple depth backbones. Specifically, high-resolution input will be fed into a shallow network to reserve more positional information and reduce computational costs, while low-resolution input will be fed into a deep network to extract more semantics. By extracting various features from high to low resolutions, the MD-IPN can improve the performance of small object detection and maintain the performance of middle and large objects. Additionally, MS-FPN is introduced to align and fuse multi-scale feature groups generated by MD-IPN to reduce the information imbalance. Extensive experiments are conducted on the COCO2017 and the typical small object dataset, VisDrone 2019. Notably, our HRDNet achieves the state-of-the-art on these two datasets with significant improvements on small objects.
KW - Deep Neural Network
KW - High-resolution Images
KW - Image Pyramid
KW - Small Object Detection
UR - http://www.scopus.com/inward/record.url?scp=85126429420&partnerID=8YFLogxK
U2 - 10.1109/ICME51207.2021.9428241
DO - 10.1109/ICME51207.2021.9428241
M3 - Conference contribution
AN - SCOPUS:85126429420
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
PB - IEEE Computer Society
Y2 - 5 July 2021 through 9 July 2021
ER -