GesGPT: Speech Gesture Synthesis with Text Parsing from ChatGPT

Nan Gao; Zeyu Zhao; Zhi Zeng; Shuwu Zhang; Dongdong Weng; Yihua Bao

doi:10.1109/LRA.2024.3359544

GesGPT: Speech Gesture Synthesis with Text Parsing from ChatGPT

Nan Gao, Zeyu Zhao, Zhi Zeng, Shuwu Zhang, Dongdong Weng, Yihua Bao^*

^*此作品的通讯作者

光电学院

科研成果: 期刊稿件 › 文章 › 同行评审

2 引用（Scopus）

摘要

Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models, such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures.

源语言	英语
页（从-至）	2718-2725
页数	8
期刊	IEEE Robotics and Automation Letters
卷	9
期	3
DOI	https://doi.org/10.1109/LRA.2024.3359544
出版状态	已出版 - 1 3月 2024

访问文件

10.1109/LRA.2024.3359544

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{19c75a1ea4eb419e89549bd0f696bc98,

title = "GesGPT: Speech Gesture Synthesis with Text Parsing from ChatGPT",

abstract = "Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models, such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures.",

keywords = "Gesture synthesis, human robot interaction, large language model",

author = "Nan Gao and Zeyu Zhao and Zhi Zeng and Shuwu Zhang and Dongdong Weng and Yihua Bao",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.",

year = "2024",

month = mar,

day = "1",

doi = "10.1109/LRA.2024.3359544",

language = "English",

volume = "9",

pages = "2718--2725",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "3",

}

TY - JOUR

T1 - GesGPT

T2 - Speech Gesture Synthesis with Text Parsing from ChatGPT

AU - Gao, Nan

AU - Zhao, Zeyu

AU - Zeng, Zhi

AU - Zhang, Shuwu

AU - Weng, Dongdong

AU - Bao, Yihua

PY - 2024/3/1

Y1 - 2024/3/1

N2 - Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models, such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures.

AB - Gesture synthesis has gained significant attention as a critical research field, aiming to produce contextually appropriate and natural gestures corresponding to speech or textual input. Although deep learning-based approaches have achieved remarkable progress, they often overlook the rich semantic information present in the text, leading to less expressive and meaningful gestures. In this letter, we propose GesGPT, a novel approach to gesture generation that leverages the semantic analysis capabilities of large language models, such as ChatGPT. By capitalizing on the strengths of LLMs for text analysis, we adopt a controlled approach to generate and integrate professional gestures and base gestures through a text parsing script, resulting in diverse and meaningful gestures. Firstly, our approach involves the development of prompt principles that transform gesture generation into an intention classification problem using ChatGPT. We also conduct further analysis on emphasis words and semantic words to aid in gesture generation. Subsequently, we construct a specialized gesture lexicon with multiple semantic annotations, decoupling the synthesis of gestures into professional gestures and base gestures. Finally, we merge the professional gestures with base gestures. Experimental results demonstrate that GesGPT effectively generates contextually appropriate and expressive gestures.

KW - Gesture synthesis

KW - human robot interaction

KW - large language model

UR - http://www.scopus.com/inward/record.url?scp=85184326609&partnerID=8YFLogxK

U2 - 10.1109/LRA.2024.3359544

DO - 10.1109/LRA.2024.3359544

M3 - Article

AN - SCOPUS:85184326609

SN - 2377-3766

VL - 9

SP - 2718

EP - 2725

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 3

ER -

GesGPT: Speech Gesture Synthesis with Text Parsing from ChatGPT

摘要

访问文件

其它文件与链接

指纹

引用此