IPG-net: Image pyramid guidance network for small object detection

Ziming Liu; Guangyu Gao; Lin Sun; Li Fang

doi:10.1109/CVPRW50498.2020.00521

IPG-net: Image pyramid guidance network for small object detection

Ziming Liu, Guangyu Gao, Lin Sun, Li Fang

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

57 引用（Scopus）

摘要

For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top- down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.

源语言	英语
主期刊名	Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020
出版商	IEEE Computer Society
页	4422-4430
页数	9
ISBN（电子版）	9781728193601
DOI	https://doi.org/10.1109/CVPRW50498.2020.00521
出版状态	已出版 - 6月 2020
活动	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020 - Virtual, Online, 美国期限: 14 6月 2020 → 19 6月 2020

出版系列

姓名	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
卷	2020-June
ISSN（印刷版）	2160-7508
ISSN（电子版）	2160-7516

会议

会议	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020
国家/地区	美国
市	Virtual, Online
时期	14/06/20 → 19/06/20

访问文件

10.1109/CVPRW50498.2020.00521

其它文件与链接

链接到 Scopus 的出版物

引用此

Liu, Z., Gao, G., Sun, L., & Fang, L. (2020). IPG-net: Image pyramid guidance network for small object detection. 在 Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020 (页码 4422-4430). 文章 9150713 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; 卷 2020-June). IEEE Computer Society. https://doi.org/10.1109/CVPRW50498.2020.00521

@inproceedings{a1a26f4f75314d45a3ca979c97562412,

title = "IPG-net: Image pyramid guidance network for small object detection",

abstract = "For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top- down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.",

author = "Ziming Liu and Guangyu Gao and Lin Sun and Li Fang",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020 ; Conference date: 14-06-2020 Through 19-06-2020",

year = "2020",

month = jun,

doi = "10.1109/CVPRW50498.2020.00521",

language = "English",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "4422--4430",

booktitle = "Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020",

address = "United States",

}

Liu, Z, Gao, G, Sun, L & Fang, L 2020, IPG-net: Image pyramid guidance network for small object detection. 在 Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020., 9150713, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 卷 2020-June, IEEE Computer Society, 页码 4422-4430, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020, Virtual, Online, 美国, 14/06/20. https://doi.org/10.1109/CVPRW50498.2020.00521

IPG-net: Image pyramid guidance network for small object detection. / Liu, Ziming; Gao, Guangyu; Sun, Lin 等.
Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020. IEEE Computer Society, 2020. 页码 4422-4430 9150713 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; 卷 2020-June).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - IPG-net

T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020

AU - Liu, Ziming

AU - Gao, Guangyu

AU - Sun, Lin

AU - Fang, Li

PY - 2020/6

Y1 - 2020/6

N2 - For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top- down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.

AB - For Convolutional Neural Network-based object detection, there is a typical dilemma: the spatial information is well kept in the shallow layers which unfortunately do not have enough semantic information, while the deep layers have a high semantic concept but lost a lot of spatial information, resulting in serious information imbalance. To acquire enough semantic information for shallow layers, Feature Pyramid Networks (FPN) is used to build a top- down propagated path. In this paper, except for top-down combining of information for shallow layers, we propose a novel network called Image Pyramid Guidance Network (IPG-Net) to make sure both the spatial information and semantic information are abundant for each layer. Our IPG-Net has two main parts: the image pyramid guidance transformation module and the image pyramid guidance fusion module. Our main idea is to introduce the image pyramid guidance into the backbone stream to solve the information imbalance problem, which alleviates the vanishment of the small object features. This IPG transformation module promises even in the deepest stage of the backbone, there is enough spatial information for bounding box regression and classification. Furthermore, we designed an effective fusion module to fuse the features from the image pyramid and features from the backbone stream. We have tried to apply this novel network to both one-stage and two-stage detection models, state of the art results are obtained on the most popular benchmark data sets, i.e. MS COCO and Pascal VOC.

UR - http://www.scopus.com/inward/record.url?scp=85090127381&partnerID=8YFLogxK

U2 - 10.1109/CVPRW50498.2020.00521

DO - 10.1109/CVPRW50498.2020.00521

M3 - Conference contribution

AN - SCOPUS:85090127381

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 4422

EP - 4430

BT - Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020

PB - IEEE Computer Society

Y2 - 14 June 2020 through 19 June 2020

ER -

Liu Z, Gao G, Sun L, Fang L. IPG-net: Image pyramid guidance network for small object detection. 在 Proceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020. IEEE Computer Society. 2020. 页码 4422-4430. 9150713. (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops). doi: 10.1109/CVPRW50498.2020.00521

IPG-net: Image pyramid guidance network for small object detection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此