Multi-view Intention Recognition in Face-to-Face Communication

Pukun Chen, Dongdong Weng*, Xiaonuo Dongye

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we propose an intention recognition method based on generative dataset. Addressing the lack of intention datasets and the recognition methods in face-to-face communication scenarios, we analyze the motions corresponding to intentions and generate a motion dataset using a diffusion model. We then employ a Transformer-based method to map video to intention. In addition, we introduce a joint intention processing method that effectively handles the differences in motion semantics across different camera views, resulting in more accurate recognition outcomes in the case of multi-view data. Overall, this article summarizes a unified framework from acquisition to recognition.

Original languageEnglish
Title of host publicationImage and Graphics Technologies and Applications - 19th Chinese Conference, IGTA 2024, Revised Selected Papers
EditorsYongtian Wang, Hua Huang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages327-338
Number of pages12
ISBN (Print)9789819799183
DOIs
Publication statusPublished - 2025
Event19th Chinese Conference on Image and Graphics Technologies and Applications, IGTA 2024 - Beijing, China
Duration: 16 Aug 202418 Aug 2024

Publication series

NameCommunications in Computer and Information Science
Volume2302 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference19th Chinese Conference on Image and Graphics Technologies and Applications, IGTA 2024
Country/TerritoryChina
CityBeijing
Period16/08/2418/08/24

Keywords

  • face-to-face communication
  • motion diffusion
  • Multi-view recognition

Cite this