RR-Net: Relation Reasoning for End-To-End Human-Object Interaction Detection

Dongming Yang, Yuexian Zou*, Can Zhang, Meng Cao, Jie Chen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)

Abstract

The task of Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects via inferring fine-grained triplets of (human, verb, object ). Most HOI feature learning techniques are dependent on pre-detected instance regions or human body-part regions, which are computationally expensive and hardly applicable to end-To-end detectors in real applications. In this paper, based on an end-To-end HOI detector, we make a first try to explore region-independent relation reasoning for HOI detection. We first present a Relation-Aware Frame, which brings a progressive structure for interaction inference. Upon the Relation-Aware Frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: A) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct a fully differentiable and end-To-end trainable network named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net leads to competitive results compared with the state-of-The-Art methods on both V-COCO and HICO-DET benchmarks and improves the baseline about 7.6% and 11.1% relatively, validating that this first effort in exploring region-independent relation reasoning has brought obvious improvement for end-To-end HOI detection.

Original languageEnglish
Pages (from-to)3853-3865
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume32
Issue number6
DOIs
Publication statusPublished - 1 Jun 2022
Externally publishedYes

Keywords

  • End-To-end
  • Human-object interaction
  • Interactive representation
  • Relation reasoning

Fingerprint

Dive into the research topics of 'RR-Net: Relation Reasoning for End-To-End Human-Object Interaction Detection'. Together they form a unique fingerprint.

Cite this