Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks

Hui Fang; Dongdong Weng; Zeyu Tian; Zhen Song

doi:10.1109/VRW58643.2023.00145

Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks

Hui Fang, Dongdong Weng^*, Zeyu Tian, Zhen Song^*

^*此作品的通讯作者

光电学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.

源语言	英语
主期刊名	Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	605-606
页数	2
ISBN（电子版）	9798350348392
DOI	https://doi.org/10.1109/VRW58643.2023.00145
出版状态	已出版 - 2023
活动	2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023 - Shanghai, 中国期限: 25 3月 2023 → 29 3月 2023

出版系列

姓名	Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023

会议

会议	2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023
国家/地区	中国
市	Shanghai
时期	25/03/23 → 29/03/23

访问文件

10.1109/VRW58643.2023.00145

其它文件与链接

链接到 Scopus 的出版物

引用此

Fang, H., Weng, D., Tian, Z., & Song, Z. (2023). Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks. 在 Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023 (页码 605-606). (Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VRW58643.2023.00145

Fang, Hui ; Weng, Dongdong ; Tian, Zeyu 等. / Audio to Deep Visual : Speaking Mouth Generation Based on 3D Sparse Landmarks. Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 605-606 (Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023).

@inproceedings{552fecdaafa648809c32e9e1d08bfc64,

title = "Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks",

abstract = "Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.",

keywords = "Applications, Artificial intelligence, Computer graphics, Computing methodologie, Computing methodologies, Natural language processing",

author = "Hui Fang and Dongdong Weng and Zeyu Tian and Zhen Song",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023 ; Conference date: 25-03-2023 Through 29-03-2023",

year = "2023",

doi = "10.1109/VRW58643.2023.00145",

language = "English",

series = "Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "605--606",

booktitle = "Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023",

address = "United States",

}

Fang, H, Weng, D, Tian, Z & Song, Z 2023, Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks. 在 Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023. Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023, Institute of Electrical and Electronics Engineers Inc., 页码 605-606, 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023, Shanghai, 中国, 25/03/23. https://doi.org/10.1109/VRW58643.2023.00145

Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks. / Fang, Hui; Weng, Dongdong; Tian, Zeyu 等.
Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 605-606 (Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Audio to Deep Visual

T2 - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023

AU - Fang, Hui

AU - Weng, Dongdong

AU - Tian, Zeyu

AU - Song, Zhen

PY - 2023

Y1 - 2023

N2 - Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.

AB - Having a system to automatically generate a talking mouth in sync with input speech would enhance speech communication and enable many novel applications. This article presents a new model that can generate 3D talking mouth landmarks from Chinese speech. We use sparse 3D landmarks to model the mouth motion, which are easy to capture and provide sufficient lip accuracy. The 4D mouth motion dataset was collected by our self-developed facial capture device, filling the gap in the Chinese speech-driven lip dataset. The exper-imental results show that the generated talking landmarks achieve accurate, smooth, and natural 3D mouth movements.

KW - Applications

KW - Artificial intelligence

KW - Computer graphics

KW - Computing methodologie

KW - Computing methodologies

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85159692078&partnerID=8YFLogxK

U2 - 10.1109/VRW58643.2023.00145

DO - 10.1109/VRW58643.2023.00145

M3 - Conference contribution

AN - SCOPUS:85159692078

T3 - Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023

SP - 605

EP - 606

BT - Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 25 March 2023 through 29 March 2023

ER -

Fang H, Weng D, Tian Z, Song Z. Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks. 在 Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 605-606. (Proceedings - 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2023). doi: 10.1109/VRW58643.2023.00145

Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此