跳到主要导航 跳到搜索 跳到主要内容

Decoupled Two-Stage Talking Head Generation via Gaussian-Landmark-Based Neural Radiance Fields

  • Boyao Ma
  • , Yuanping Cao
  • , Lei Zhang*
  • *此作品的通讯作者
  • Beijing Institute of Technology

科研成果: 期刊稿件文章同行评审

摘要

Talking head generation based on neural radiance fields (NeRF) has gained prominence, primarily owing to its implicit 3D representation capability within neural networks. However, most NeRF-based methods often intertwine audio-to-video conversion in a joint training process, resulting in challenges such as inadequate lip synchronization, limited learning efficiency, large memory requirement, and lack of editability. In response to these issues, this paper introduces a fully decoupled NeRF-based method for generating talking heads. This method separates audio-to-video conversion into two stages through the use of facial landmarks. Notably, the Transformer network is used to effectively establish the cross-modal connection between audio and landmarks and to generate landmarks conforming to the distribution of training data. We also explore formant features of the audio as additional conditions to guide landmark generation. Then, these landmarks are combined with Gaussian relative position coding to refine the sampling points on the rays, thereby constructing a dynamic NeRF conditioned on these landmarks and audio features for rendering the generated head. This decoupled setup enhances both the fidelity and flexibility of mapping audio to video with two independent small-scale networks. Additionally, it supports the generation of the torso from the head-only image with improved StyleUnet, further enhancing the realism of the generated talking head. Our experimental results demonstrate that our method excels in producing lifelike talking heads, and that the lightweight neural network models also exhibit superior speed and learning efficiency with lower memory requirements.

源语言英语
页(从-至)799-816
页数18
期刊Computational Visual Media
11
4
DOI
出版状态已出版 - 2025

指纹

探究 'Decoupled Two-Stage Talking Head Generation via Gaussian-Landmark-Based Neural Radiance Fields' 的科研主题。它们共同构成独一无二的指纹。

引用此