{"title":"Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos","authors":"Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu","doi":"arxiv-2409.08353","DOIUrl":null,"url":null,"abstract":"Volumetric video represents a transformative advancement in visual media,\nenabling users to freely navigate immersive virtual experiences and narrowing\nthe gap between digital and real worlds. However, the need for extensive manual\nintervention to stabilize mesh sequences and the generation of excessively\nlarge assets in existing workflows impedes broader adoption. In this paper, we\npresent a novel Gaussian-based approach, dubbed \\textit{DualGS}, for real-time\nand high-fidelity playback of complex human performance with excellent\ncompression ratios. Our key idea in DualGS is to separately represent motion\nand appearance using the corresponding skin and joint Gaussians. Such an\nexplicit disentanglement can significantly reduce motion redundancy and enhance\ntemporal coherence. We begin by initializing the DualGS and anchoring skin\nGaussians to joint Gaussians at the first frame. Subsequently, we employ a\ncoarse-to-fine training strategy for frame-by-frame human performance modeling.\nIt includes a coarse alignment phase for overall motion prediction as well as a\nfine-grained optimization for robust tracking and high-fidelity rendering. To\nintegrate volumetric video seamlessly into VR environments, we efficiently\ncompress motion using entropy encoding and appearance using codec compression\ncoupled with a persistent codebook. Our approach achieves a compression ratio\nof up to 120 times, only requiring approximately 350KB of storage per frame. We\ndemonstrate the efficacy of our representation through photo-realistic,\nfree-view experiences on VR headsets, enabling users to immersively watch\nmusicians in performance and feel the rhythm of the notes at the performers'\nfingertips.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Volumetric video represents a transformative advancement in visual media,
enabling users to freely navigate immersive virtual experiences and narrowing
the gap between digital and real worlds. However, the need for extensive manual
intervention to stabilize mesh sequences and the generation of excessively
large assets in existing workflows impedes broader adoption. In this paper, we
present a novel Gaussian-based approach, dubbed \textit{DualGS}, for real-time
and high-fidelity playback of complex human performance with excellent
compression ratios. Our key idea in DualGS is to separately represent motion
and appearance using the corresponding skin and joint Gaussians. Such an
explicit disentanglement can significantly reduce motion redundancy and enhance
temporal coherence. We begin by initializing the DualGS and anchoring skin
Gaussians to joint Gaussians at the first frame. Subsequently, we employ a
coarse-to-fine training strategy for frame-by-frame human performance modeling.
It includes a coarse alignment phase for overall motion prediction as well as a
fine-grained optimization for robust tracking and high-fidelity rendering. To
integrate volumetric video seamlessly into VR environments, we efficiently
compress motion using entropy encoding and appearance using codec compression
coupled with a persistent codebook. Our approach achieves a compression ratio
of up to 120 times, only requiring approximately 350KB of storage per frame. We
demonstrate the efficacy of our representation through photo-realistic,
free-view experiences on VR headsets, enabling users to immersively watch
musicians in performance and feel the rhythm of the notes at the performers'
fingertips.