Attentive contexts for object detection

Jianan Li; Yunchao Wei; Xiaodan Liang; Jian Dong; Tingfa Xu; Jiashi Feng; Shuicheng Yan

doi:10.1109/TMM.2016.2642789

Attentive contexts for object detection

Jianan Li, Yunchao Wei, Xiaodan Liang, Jian Dong, Tingfa Xu^*, Jiashi Feng, Shuicheng Yan

^*此作品的通讯作者

光电学院

科研成果: 期刊稿件 › 文章 › 同行评审

213 引用（Scopus）

摘要

Modern deep neural network-based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not fully exploited by existing methods yet. In this work, we take a step towards understanding what is a robust practice to extract and utilize contextual information to facilitate object detection in practice. Specifically, we consider the following two questions: "how to identify useful global contextual information for detecting a certain object?" and "how to exploit local context surrounding a proposal for better inferring its contents?" We provide preliminary answers to these questions through developing a novel attention to context convolution neural network (AC-CNN)-based object detection model. AC-CNN effectively incorporates global and local contextual information into the region-based CNN (e.g., fast R-CNN and faster R-CNN) detection framework and provides better object detection performance. It consists of one attention-based global contextualized (AGC) subnetwork and one multi-scale local contextualized (MLC) subnetwork. To capture global context, the AGC subnetwork recurrently generates an attention map for an input image to highlight useful global contextual locations, through multiple stacked long short-term memory layers. For capturing surrounding local context, the MLC subnetwork exploits both the inside and outside contextual information of each specific proposal at multiple scales. The global and local context are then fused together for making the final decision for detection. Extensive experiments on PASCAL VOC 2007 and VOC 2012 well demonstrate the superiority of the proposed AC-CNN over well-established baselines.

源语言	英语
文章编号	7792742
页（从-至）	944-954
页数	11
期刊	IEEE Transactions on Multimedia
卷	19
期	5
DOI	https://doi.org/10.1109/TMM.2016.2642789
出版状态	已出版 - 5月 2017

访问文件

10.1109/TMM.2016.2642789

其它文件与链接

链接到 Scopus 的出版物

引用此

Li, J., Wei, Y., Liang, X., Dong, J., Xu, T., Feng, J., & Yan, S. (2017). Attentive contexts for object detection. IEEE Transactions on Multimedia, 19(5), 944-954. 文章 7792742. https://doi.org/10.1109/TMM.2016.2642789

@article{1ed1ff8af922460d95b51ab65eafb041,

title = "Attentive contexts for object detection",

abstract = "Modern deep neural network-based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not fully exploited by existing methods yet. In this work, we take a step towards understanding what is a robust practice to extract and utilize contextual information to facilitate object detection in practice. Specifically, we consider the following two questions: {"}how to identify useful global contextual information for detecting a certain object?{"} and {"}how to exploit local context surrounding a proposal for better inferring its contents?{"} We provide preliminary answers to these questions through developing a novel attention to context convolution neural network (AC-CNN)-based object detection model. AC-CNN effectively incorporates global and local contextual information into the region-based CNN (e.g., fast R-CNN and faster R-CNN) detection framework and provides better object detection performance. It consists of one attention-based global contextualized (AGC) subnetwork and one multi-scale local contextualized (MLC) subnetwork. To capture global context, the AGC subnetwork recurrently generates an attention map for an input image to highlight useful global contextual locations, through multiple stacked long short-term memory layers. For capturing surrounding local context, the MLC subnetwork exploits both the inside and outside contextual information of each specific proposal at multiple scales. The global and local context are then fused together for making the final decision for detection. Extensive experiments on PASCAL VOC 2007 and VOC 2012 well demonstrate the superiority of the proposed AC-CNN over well-established baselines.",

keywords = "Context, neural networks, object detection",

author = "Jianan Li and Yunchao Wei and Xiaodan Liang and Jian Dong and Tingfa Xu and Jiashi Feng and Shuicheng Yan",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2017",

month = may,

doi = "10.1109/TMM.2016.2642789",

language = "English",

volume = "19",

pages = "944--954",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "5",

}

TY - JOUR

T1 - Attentive contexts for object detection

AU - Li, Jianan

AU - Wei, Yunchao

AU - Liang, Xiaodan

AU - Dong, Jian

AU - Xu, Tingfa

AU - Feng, Jiashi

AU - Yan, Shuicheng

PY - 2017/5

Y1 - 2017/5

N2 - Modern deep neural network-based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not fully exploited by existing methods yet. In this work, we take a step towards understanding what is a robust practice to extract and utilize contextual information to facilitate object detection in practice. Specifically, we consider the following two questions: "how to identify useful global contextual information for detecting a certain object?" and "how to exploit local context surrounding a proposal for better inferring its contents?" We provide preliminary answers to these questions through developing a novel attention to context convolution neural network (AC-CNN)-based object detection model. AC-CNN effectively incorporates global and local contextual information into the region-based CNN (e.g., fast R-CNN and faster R-CNN) detection framework and provides better object detection performance. It consists of one attention-based global contextualized (AGC) subnetwork and one multi-scale local contextualized (MLC) subnetwork. To capture global context, the AGC subnetwork recurrently generates an attention map for an input image to highlight useful global contextual locations, through multiple stacked long short-term memory layers. For capturing surrounding local context, the MLC subnetwork exploits both the inside and outside contextual information of each specific proposal at multiple scales. The global and local context are then fused together for making the final decision for detection. Extensive experiments on PASCAL VOC 2007 and VOC 2012 well demonstrate the superiority of the proposed AC-CNN over well-established baselines.

AB - Modern deep neural network-based object detection methods typically classify candidate proposals using their interior features. However, global and local surrounding contexts that are believed to be valuable for object detection are not fully exploited by existing methods yet. In this work, we take a step towards understanding what is a robust practice to extract and utilize contextual information to facilitate object detection in practice. Specifically, we consider the following two questions: "how to identify useful global contextual information for detecting a certain object?" and "how to exploit local context surrounding a proposal for better inferring its contents?" We provide preliminary answers to these questions through developing a novel attention to context convolution neural network (AC-CNN)-based object detection model. AC-CNN effectively incorporates global and local contextual information into the region-based CNN (e.g., fast R-CNN and faster R-CNN) detection framework and provides better object detection performance. It consists of one attention-based global contextualized (AGC) subnetwork and one multi-scale local contextualized (MLC) subnetwork. To capture global context, the AGC subnetwork recurrently generates an attention map for an input image to highlight useful global contextual locations, through multiple stacked long short-term memory layers. For capturing surrounding local context, the MLC subnetwork exploits both the inside and outside contextual information of each specific proposal at multiple scales. The global and local context are then fused together for making the final decision for detection. Extensive experiments on PASCAL VOC 2007 and VOC 2012 well demonstrate the superiority of the proposed AC-CNN over well-established baselines.

KW - Context

KW - neural networks

KW - object detection

UR - http://www.scopus.com/inward/record.url?scp=85018169018&partnerID=8YFLogxK

U2 - 10.1109/TMM.2016.2642789

DO - 10.1109/TMM.2016.2642789

M3 - Article

AN - SCOPUS:85018169018

SN - 1520-9210

VL - 19

SP - 944

EP - 954

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

IS - 5

M1 - 7792742

ER -

Attentive contexts for object detection

摘要

访问文件

其它文件与链接

指纹

引用此