SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain

  • Nan Gao*
  • , Yihua Bao
  • , Dongdong Weng
  • , Jiayi Zhao
  • , Jia Li
  • , Yan Zhou
  • , Pengfei Wan
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Co-speech gesture generation enhances human-computer interaction realism through speech-synchronized gesture synthesis. However, generating semantically meaningful gestures remains a challenging problem. We propose SARGes, a novel framework that leverages large language models (LLMs) to construct an intent chain for parsing speech content and generating reliable semantic gesture labels, which subsequently guide the synthesis of meaningful co-speech gestures. First, we constructed a comprehensive co-speech gesture ethogram and developed an LLM-based intent chain reasoning mechanism that systematically parses and decomposes gesture semantics into structured inference steps following ethogram criteria, effectively guiding LLMs to parse context-aware gesture labels. Subsequently, we constructed a text-to-gesture label dataset and trained a lightweight gesture label generation model, which then guides the generation of credible and semantically coherent co-speech gestures. Experimental results show that SARGes achieves gesture labeling performance comparable to GPT-4 in intent interpretation, with efficient single-pass inference (0.4 seconds), and significantly improves the semantic expressiveness of gesture generation.

Original languageEnglish
Title of host publicationGENEA 2025 - Proceedings of the International Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents, co-located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages13-21
Number of pages9
ISBN (Electronic)9798400720505
DOIs
Publication statusPublished - 26 Oct 2025
EventWorkshop on Generation and Evaluation of Nonverbal Behaviour for Embodied Agents, GENEA Workshop 2025 - Dublin, Ireland
Duration: 31 Oct 202531 Oct 2025

Publication series

NameGENEA 2025 - Proceedings of the International Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents, co-located with MM 2025

Conference

ConferenceWorkshop on Generation and Evaluation of Nonverbal Behaviour for Embodied Agents, GENEA Workshop 2025
Country/TerritoryIreland
CityDublin
Period31/10/2531/10/25

Keywords

  • co-speech gesture generation
  • gesture ethogram
  • intent chain
  • large language models

Fingerprint

Dive into the research topics of 'SARGes: Semantically Aligned Reliable Gesture Generation via Intent Chain'. Together they form a unique fingerprint.

Cite this