Cascaded Parsing of Human-Object Interaction Recognition

Tianfei Zhou, Siyuan Qi, Wenguan Wang*, Jianbing Shen, Song Chun Zhu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

80 Citations (Scopus)

Abstract

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering the intrinsic complexity and structural nature of the task, we introduce a cascaded parsing network (CP-HOI) for a multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines HOI proposals and feeds them into a structured interaction reasoning module. Each of the two modules is also connected to its predecessor in the previous stage, enabling efficient cross-stage information propagation. The structured interaction reasoning module is built upon a graph parsing neural network (GPNN), which efficiently models potential HOI structures as graphs and mines rich context for comprehensive relation understanding. In particular, GPNN infers a parse graph that i) interprets meaningful HOI structures by a learnable adjacency matrix, and ii) predicts action (edge) labels. Within an end-to-end, message-passing framework, GPNN blends learning and inference, iteratively parsing HOI structures and reasoning HOI representations (i.e., instance and relation features). Further beyond relation detection at a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. A preliminary version of our CP-HOI model reached 1st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation. In addition, our CP-HOI shows promising results on two popular HOI recognition benchmarks, i.e., V-COCO and HICO-DET.

Original languageEnglish
Pages (from-to)2827-2840
Number of pages14
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume44
Issue number6
DOIs
Publication statusPublished - 1 Jun 2022
Externally publishedYes

Keywords

  • Human-object interaction recognition
  • cascaded parsing
  • fine-grained relation segmentation

Fingerprint

Dive into the research topics of 'Cascaded Parsing of Human-Object Interaction Recognition'. Together they form a unique fingerprint.

Cite this