TY - JOUR
T1 - Multi-scale object detection by top-down and bottom-up feature pyramid network
AU - Baojun, Zhao
AU - Boya, Zhao
AU - Linbo, Tang
AU - Wenzheng, Wang
AU - Chen, Wu
N1 - Publisher Copyright:
© 1990-2011 Beijing Institute of Aerospace Information.
PY - 2019/2
Y1 - 2019/2
N2 - While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement sub-original information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.
AB - While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement sub-original information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.
KW - convolutional neural network (CNN)
KW - deconvolution
KW - feature pyramid network (FPN)
KW - object detection
UR - http://www.scopus.com/inward/record.url?scp=85062697084&partnerID=8YFLogxK
U2 - 10.21629/JSEE.2019.01.01
DO - 10.21629/JSEE.2019.01.01
M3 - Article
AN - SCOPUS:85062697084
SN - 1671-1793
VL - 30
SP - 1
EP - 12
JO - Journal of Systems Engineering and Electronics
JF - Journal of Systems Engineering and Electronics
IS - 1
ER -