Cascaded human-object interaction recognition

Tianfei Zhou; Wenguan Wang; Siyuan Qi; Haibin Ling; Jianbing Shen

doi:10.1109/CVPR42600.2020.00432

Cascaded human-object interaction recognition

Tianfei Zhou, Wenguan Wang, Siyuan Qi, Haibin Ling, Jianbing Shen^*

^*此作品的通讯作者

科研成果: 期刊稿件 › 会议文章 › 同行评审

113 引用（Scopus）

摘要

Rapid progress has been witnessed for human-object interaction (HOI) recognition, but most existing models are confined to single-stage reasoning pipelines. Considering the intrinsic complexity of the task, we introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. Each of the two networks is also connected to its predecessor at the previous stage, enabling cross-stage information propagation. The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding. Further beyond relation detection on a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. Our approach reached the 1^st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation tasks. It also shows promising results on V-COCO.

源语言	英语
文章编号	9156410
页（从-至）	4262-4271
页数	10
期刊	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOI	https://doi.org/10.1109/CVPR42600.2020.00432
出版状态	已出版 - 2020
已对外发布	是
活动	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, 美国期限: 14 6月 2020 → 19 6月 2020

访问文件

10.1109/CVPR42600.2020.00432

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{1c15e4e449b74283b876536925d3c21a,

title = "Cascaded human-object interaction recognition",

abstract = "Rapid progress has been witnessed for human-object interaction (HOI) recognition, but most existing models are confined to single-stage reasoning pipelines. Considering the intrinsic complexity of the task, we introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. Each of the two networks is also connected to its predecessor at the previous stage, enabling cross-stage information propagation. The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding. Further beyond relation detection on a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. Our approach reached the 1st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation tasks. It also shows promising results on V-COCO.",

author = "Tianfei Zhou and Wenguan Wang and Siyuan Qi and Haibin Ling and Jianbing Shen",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE; 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 ; Conference date: 14-06-2020 Through 19-06-2020",

year = "2020",

doi = "10.1109/CVPR42600.2020.00432",

language = "English",

pages = "4262--4271",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Cascaded human-object interaction recognition

AU - Zhou, Tianfei

AU - Wang, Wenguan

AU - Qi, Siyuan

AU - Ling, Haibin

AU - Shen, Jianbing

PY - 2020

Y1 - 2020

N2 - Rapid progress has been witnessed for human-object interaction (HOI) recognition, but most existing models are confined to single-stage reasoning pipelines. Considering the intrinsic complexity of the task, we introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. Each of the two networks is also connected to its predecessor at the previous stage, enabling cross-stage information propagation. The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding. Further beyond relation detection on a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. Our approach reached the 1st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation tasks. It also shows promising results on V-COCO.

AB - Rapid progress has been witnessed for human-object interaction (HOI) recognition, but most existing models are confined to single-stage reasoning pipelines. Considering the intrinsic complexity of the task, we introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. Each of the two networks is also connected to its predecessor at the previous stage, enabling cross-stage information propagation. The interaction recognition network has two crucial parts: a relation ranking module for high-quality HOI proposal selection and a triple-stream classifier for relation prediction. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding. Further beyond relation detection on a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. Our approach reached the 1st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation tasks. It also shows promising results on V-COCO.

UR - http://www.scopus.com/inward/record.url?scp=85089575920&partnerID=8YFLogxK

U2 - 10.1109/CVPR42600.2020.00432

DO - 10.1109/CVPR42600.2020.00432

M3 - Conference article

AN - SCOPUS:85089575920

SN - 1063-6919

SP - 4262

EP - 4271

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

M1 - 9156410

T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020

Y2 - 14 June 2020 through 19 June 2020

ER -

Cascaded human-object interaction recognition

摘要

访问文件

其它文件与链接

指纹

引用此