Cetus: Online Context-Aware Cross-Layer Coordination for Efficient Live Volumetric Video Streaming

Research output: Contribution to journalArticlepeer-review

Abstract

In recent years, volumetric videos have gradually prospered as an intriguing video paradigm, offering users a fully immersive viewing experience with six Degrees of Freedom (DoF). However, most current live volumetric video streaming methods struggle to facilitate the real-time performance requirements due to the nature of frequent user interactions and the complexity of network environments during video playback. Inspired by the correlation between the human visual effects and adjacent frame motion features, we propose Cetus, a context-aware cross-layer coordination system for live volumetric videos. First, we present an application-layer Neural Radiance Fields (NeRF)-based codec framework that leverages spatio-temporal semantic information for optimizing the compression quality of each video frame. Second, we exploit a flexible cross-layer coordination framework that seamlessly integrates frame drop strategy with partially reliable transmission, orchestrating transport protocols and application-informed rates to enhance the Quality of Experience (QoE) for multiple users. Furthermore, we develop a lightweight branching decision tree algorithm that adaptively makes fine-grained frame drop decisions. Experimental evaluations of our implemented system prototype demonstrate that Cetus significantly outperforms existing baseline approaches. Compared to the state-of-the-art baselines, Cetus effectively improves video frame rate by at least 24.7% and video quality by an average of 32.6%.

Original languageEnglish
Pages (from-to)2076-2091
Number of pages16
JournalIEEE Transactions on Networking
Volume34
DOIs
Publication statusPublished - 2026
Externally publishedYes

Keywords

  • NeRF representation
  • Volumetric video streaming
  • cross-layer coordination
  • frame drop

Fingerprint

Dive into the research topics of 'Cetus: Online Context-Aware Cross-Layer Coordination for Efficient Live Volumetric Video Streaming'. Together they form a unique fingerprint.

Cite this