Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks

Hui Fang, Dongdong Weng*, Zeyu Tian, Zhen Song*

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.

源语言英语
主期刊名Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
出版商Institute of Electrical and Electronics Engineers Inc.
605-606
页数2
ISBN(电子版)9798350348392
DOI
出版状态已出版 - 2023
活动2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023 - Shanghai, 中国
期限: 25 3月 202329 3月 2023

出版系列

姓名Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023

会议

会议2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
国家/地区中国
Shanghai
时期25/03/2329/03/23

指纹

探究 'Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks' 的科研主题。它们共同构成独一无二的指纹。

引用此