Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation

  • Biao Dong
  • , Lei Zhang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

NeRF-based talking head generation has made great progress, but existing methods still lack in achieving high-quality detail fidelity, mainly manifested in detail loss and intermittent blur. We attribute this to the limitations of the training video data in terms of viewpoint and lighting, which leads to the inability to fully model the global depth and brightness information of spatial points. Specifically, a fixed viewpoint may fail to provide sufficient depth information for high-frequency details, leading to inaccurate volume density estimation and the loss of details such as hair. Furthermore, constant lighting often fails to adapt to the drastic brightness changes of continuous video frames, resulting in color accumulation errors and blurring artifacts. To address these issues, we propose a novel talking head generation method that combines layered viewpoint simulation (LVS) and continuous lighting simulation (CLS). LVS simulates multiple viewpoints through the multi-scale features of the video frame to construct the global depth representation, which can improve the accuracy of volume density estimation and enhance detail description. CLS simulates multiple lighting through brightness changes of continuous video frames to construct the global brightness representation, thereby alleviating color accumulation errors and eliminating blur. Extensive experiments demonstrate that our method significantly improves the detail quality compared to the state-of-the-art methods.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages10258-10267
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • lighting
  • multimodality
  • neural radiance fields
  • talking head
  • viewpoint

Fingerprint

Dive into the research topics of 'Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation'. Together they form a unique fingerprint.

Cite this