TY - JOUR
T1 - RR-Net
T2 - Relation Reasoning for End-To-End Human-Object Interaction Detection
AU - Yang, Dongming
AU - Zou, Yuexian
AU - Zhang, Can
AU - Cao, Meng
AU - Chen, Jie
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2022/6/1
Y1 - 2022/6/1
N2 - The task of Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring fine-grained triplets of (human, verb, object ). Most HOI feature learning techniques are dependent on pre-detected instance regions or human body-part regions, which are computationally expensive and hardly applicable to end-To-end detectors in real applications. In this paper, based on an end-To-end HOI detector, we make a first try to explore region-independent relation reasoning for HOI detection. We first present a Relation-Aware Frame, which brings a progressive structure for interaction inference. Upon the Relation-Aware Frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: A) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct a fully differentiable and end-To-end trainable network named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net leads to competitive results compared with the state-of-The-Art methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 7.6% and 11.1% relatively, validating that this first effort in exploring region-independent relation reasoning has brought obvious improvement for end-To-end HOI detection.
AB - The task of Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring fine-grained triplets of (human, verb, object ). Most HOI feature learning techniques are dependent on pre-detected instance regions or human body-part regions, which are computationally expensive and hardly applicable to end-To-end detectors in real applications. In this paper, based on an end-To-end HOI detector, we make a first try to explore region-independent relation reasoning for HOI detection. We first present a Relation-Aware Frame, which brings a progressive structure for interaction inference. Upon the Relation-Aware Frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: A) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct a fully differentiable and end-To-end trainable network named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net leads to competitive results compared with the state-of-The-Art methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 7.6% and 11.1% relatively, validating that this first effort in exploring region-independent relation reasoning has brought obvious improvement for end-To-end HOI detection.
KW - End-To-end
KW - Human-object interaction
KW - Interactive representation
KW - Relation reasoning
UR - http://www.scopus.com/inward/record.url?scp=85117806126&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2021.3119892
DO - 10.1109/TCSVT.2021.3119892
M3 - Article
AN - SCOPUS:85117806126
SN - 1051-8215
VL - 32
SP - 3853
EP - 3865
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 6
ER -