TY - JOUR
T1 - 融合音画同步的唇形合成研究
AU - Jin, Cong
AU - Wang, Jie
AU - Guo, Zichun
AU - Wang, Jing
N1 - Publisher Copyright:
© 2022 Chinese Medical Association. All rights reserved.
PY - 2023/9/15
Y1 - 2023/9/15
N2 - With the flourishing development of video-based information dissemination, audio and video synchronization is gradually becoming an important standard for measuring video quality. Deep synthesis technology has been entering the public's view in the international communication field, and lip-sync technology integrating audio and video synchronization has attracted more and more attention. The existing lip-synthesis models are mainly based on lip-synthesis of static images, which are not effective for synthesis of dynamic videos, and most of them use English datasets for training which results in poor synthesis of Chinese Mandarin. To address these problems, this paper conducted optimization experiments on the Wav2Lip lip synthesis model in Chinese context based on its research foundation, and tested the effect of different routes of training models through multiple sets of experiments, which provided important reference values for the subsequent Wav2Lip series research. This study realized lip synthesis from speech-driven to text-driven, discussed the application of lip synthesis in multiple fields such as virtual digital human, and laid the foundation for the broader application and development of lip synthesis technology.
AB - With the flourishing development of video-based information dissemination, audio and video synchronization is gradually becoming an important standard for measuring video quality. Deep synthesis technology has been entering the public's view in the international communication field, and lip-sync technology integrating audio and video synchronization has attracted more and more attention. The existing lip-synthesis models are mainly based on lip-synthesis of static images, which are not effective for synthesis of dynamic videos, and most of them use English datasets for training which results in poor synthesis of Chinese Mandarin. To address these problems, this paper conducted optimization experiments on the Wav2Lip lip synthesis model in Chinese context based on its research foundation, and tested the effect of different routes of training models through multiple sets of experiments, which provided important reference values for the subsequent Wav2Lip series research. This study realized lip synthesis from speech-driven to text-driven, discussed the application of lip synthesis in multiple fields such as virtual digital human, and laid the foundation for the broader application and development of lip synthesis technology.
KW - artificial intelligence
KW - computer visualization
KW - deep learning
KW - lip generation
KW - synchronization of audio and video
UR - http://www.scopus.com/inward/record.url?scp=85175338275&partnerID=8YFLogxK
U2 - 10.11959/j.issn.2096-6652.202335
DO - 10.11959/j.issn.2096-6652.202335
M3 - 文章
AN - SCOPUS:85175338275
SN - 2096-6652
VL - 5
SP - 397
EP - 405
JO - Chinese Journal of Intelligent Science and Technology
JF - Chinese Journal of Intelligent Science and Technology
IS - 3
ER -