Skip to main navigation Skip to search Skip to main content

A general description generator for human activity images based on deep understanding framework

  • Zheng Zhou
  • , Kan Li*
  • , Lin Bai
  • *Corresponding author for this work
  • Beijing Institute of Technology
  • Guangxi University

Research output: Contribution to journalArticlepeer-review

Abstract

Image description generation is of great application value in online image searching. Inspired by the recent achievements on neocortex study, we design a deep image understanding framework to implement a description generator for general images involving human activities. Different from existing work on image description, which regards it as a retrieval problem instead of trying to understand an image, our framework can recognize the human–object interaction (HOI) activity in the image based on the co-occurrence analysis of 3-D spatial layout and generate natural language description according to what is really happening in the image. We propose a deep hierarchical model to do the image recognition and a syntactic tree-based model to do the natural language generation. With the consideration of supporting online image searching, these two models are designed to uniformly extract features from humans and different object classes and produce well-formed sentences describing the exact things happening in the image. By conducting experiments on the dataset containing images from the phrasal recognition dataset, the six-class sports dataset and the UIUC Pascal sentence dataset, we demonstrate that our framework outperforms the state-of-the-art methods on recognizing HOI activities and generating image descriptions.

Original languageEnglish
Pages (from-to)2147-2163
Number of pages17
JournalNeural Computing and Applications
Volume28
Issue number8
DOIs
Publication statusPublished - 1 Aug 2017

Keywords

  • 3-D spatial context
  • Deep hierarchical model
  • Factored three-way interaction
  • Human–object interaction activity
  • Image description generation

Fingerprint

Dive into the research topics of 'A general description generator for human activity images based on deep understanding framework'. Together they form a unique fingerprint.

Cite this