Multi-scale object detection by top-down and bottom-up feature pyramid network

Zhao Baojun; Zhao Boya; Tang Linbo; Wang Wenzheng; Wu Chen

doi:10.21629/JSEE.2019.01.01

Multi-scale object detection by top-down and bottom-up feature pyramid network

Zhao Baojun^*, Zhao Boya, Tang Linbo, Wang Wenzheng, Wu Chen

^*此作品的通讯作者

信息与电子学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

25 引用（Scopus）

摘要

While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement sub-original information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.

源语言	英语
页（从-至）	1-12
页数	12
期刊	Journal of Systems Engineering and Electronics
卷	30
期	1
DOI	https://doi.org/10.21629/JSEE.2019.01.01
出版状态	已出版 - 2月 2019

访问文件

10.21629/JSEE.2019.01.01

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{c51d5ec17f4c4cdfbf5afd303f28ddec,

title = "Multi-scale object detection by top-down and bottom-up feature pyramid network",

abstract = "While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement sub-original information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.",

keywords = "convolutional neural network (CNN), deconvolution, feature pyramid network (FPN), object detection",

author = "Zhao Baojun and Zhao Boya and Tang Linbo and Wang Wenzheng and Wu Chen",

note = "Publisher Copyright: {\textcopyright} 1990-2011 Beijing Institute of Aerospace Information.",

year = "2019",

month = feb,

doi = "10.21629/JSEE.2019.01.01",

language = "English",

volume = "30",

pages = "1--12",

journal = "Journal of Systems Engineering and Electronics",

issn = "1671-1793",

publisher = "Kexue Chubaneshe/Science Press",

number = "1",

}

TY - JOUR

T1 - Multi-scale object detection by top-down and bottom-up feature pyramid network

AU - Baojun, Zhao

AU - Boya, Zhao

AU - Linbo, Tang

AU - Wenzheng, Wang

AU - Chen, Wu

PY - 2019/2

Y1 - 2019/2

N2 - While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement sub-original information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.

AB - While moving ahead with the object detection technology, especially deep neural networks, many related tasks, such as medical application and industrial automation, have achieved great success. However, the detection of objects with multiple aspect ratios and scales is still a key problem. This paper proposes a top-down and bottom-up feature pyramid network (TDBU-FPN), which combines multi-scale feature representation and anchor generation at multiple aspect ratios. First, in order to build the multi-scale feature map, this paper puts a number of fully convolutional layers after the backbone. Second, to link neighboring feature maps, top-down and bottom-up flows are adopted to introduce context information via top-down flow and supplement sub-original information via bottom-up flow. The top-down flow refers to the deconvolution procedure, and the bottom-up flow refers to the pooling procedure. Third, the problem of adapting different object aspect ratios is tackled via many anchor shapes with different aspect ratios on each multi-scale feature map. The proposed method is evaluated on the pattern analysis, statistical modeling and computational learning visual object classes (PASCAL VOC) dataset and reaches an accuracy of 79%, which exhibits a 1.8% improvement with a detection speed of 23 fps.

KW - convolutional neural network (CNN)

KW - deconvolution

KW - feature pyramid network (FPN)

KW - object detection

UR - http://www.scopus.com/inward/record.url?scp=85062697084&partnerID=8YFLogxK

U2 - 10.21629/JSEE.2019.01.01

DO - 10.21629/JSEE.2019.01.01

M3 - Article

AN - SCOPUS:85062697084

SN - 1671-1793

VL - 30

SP - 1

EP - 12

JO - Journal of Systems Engineering and Electronics

JF - Journal of Systems Engineering and Electronics

IS - 1

ER -

Multi-scale object detection by top-down and bottom-up feature pyramid network

摘要

访问文件

其它文件与链接

指纹

引用此