Human interaction recognition using spatio-temporal words

Lei Han, Jun Feng Li, Yun De Jia*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

This paper proposes a hierarchical approach for recognizing person-to-person interaction in indoor scenario from a single view, which is based on spatial-temporal feature extraction and representation. The dense space-time interest points detected from videos are divided into two sets exclusively according to the history information along the evolvement and the connectivity of the two human silhouettes. Then K-means clustering performs on points in the training set and learns the spatial-temporal codebook. For a given set of interest points, a spatial-temporal word is built by allowing each point to vote softly into the few centers nearest to it and accumulating the scores of all the points. The Conditional Random Field (CRF) whose inputs are the spatial-temporal words is used to modeling the primitive actions for each person, and common sense domain knowledge and first order logic production rules with weights are employed to learn the structure and the parameters of Markov Logic Network (MLN). The MLN can naturally integrate common sense reasoning with uncertain analysis, which is capable to deal with the uncertainty produced by CRF. Experiment results on the interaction dataset are provided to demonstrate the effectiveness and the robustness.

Original languageEnglish
Pages (from-to)776-784
Number of pages9
JournalJisuanji Xuebao/Chinese Journal of Computers
Volume33
Issue number4
DOIs
Publication statusPublished - Apr 2010

Keywords

  • Action recognition
  • Conditional random field
  • Interaction analysis
  • Markov logic network
  • Spatial-temporal feature

Fingerprint

Dive into the research topics of 'Human interaction recognition using spatio-temporal words'. Together they form a unique fingerprint.

Cite this