SAILOR: Synergizing Radiance and Occupancy Fields for Live Human Performance Capture

Zheng Dong, Ke Xu, Yaoan Gao, Qilin Sun, Hujun Bao, Weiwei Xu, Rynson W. H. Lau
{"title":"SAILOR: Synergizing Radiance and Occupancy Fields for Live Human Performance Capture","authors":"Zheng Dong, Ke Xu, Yaoan Gao, Qilin Sun, Hujun Bao, Weiwei Xu, Rynson W. H. Lau","doi":"10.1145/3618370","DOIUrl":null,"url":null,"abstract":"Immersive user experiences in live VR/AR performances require a fast and accurate free-view rendering of the performers. Existing methods are mainly based on Pixel-aligned Implicit Functions (PIFu) or Neural Radiance Fields (NeRF). However, while PIFu-based methods usually fail to produce photorealistic view-dependent textures, NeRF-based methods typically lack local geometry accuracy and are computationally heavy (e.g., dense sampling of 3D points, additional fine-tuning, or pose estimation). In this work, we propose a novel generalizable method, named SAILOR, to create high-quality human free-view videos from very sparse RGBD live streams. To produce view-dependent textures while preserving locally accurate geometry, we integrate PIFu and NeRF such that they work synergistically by conditioning the PIFu on depth and then rendering view-dependent textures through NeRF. Specifically, we propose a novel network, named SRONet, for this hybrid representation. SRONet can handle unseen performers without fine-tuning. Besides, a neural blending-based ray interpolation approach, a tree-based voxel-denoising scheme, and a parallel computing pipeline are incorporated to reconstruct and render live free-view videos at 10 fps on average. To evaluate the rendering performance, we construct a real-captured RGBD benchmark from 40 performers. Experimental results show that SAILOR outperforms existing human reconstruction and performance capture methods.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":"12 13","pages":"1 - 15"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics (TOG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3618370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Immersive user experiences in live VR/AR performances require a fast and accurate free-view rendering of the performers. Existing methods are mainly based on Pixel-aligned Implicit Functions (PIFu) or Neural Radiance Fields (NeRF). However, while PIFu-based methods usually fail to produce photorealistic view-dependent textures, NeRF-based methods typically lack local geometry accuracy and are computationally heavy (e.g., dense sampling of 3D points, additional fine-tuning, or pose estimation). In this work, we propose a novel generalizable method, named SAILOR, to create high-quality human free-view videos from very sparse RGBD live streams. To produce view-dependent textures while preserving locally accurate geometry, we integrate PIFu and NeRF such that they work synergistically by conditioning the PIFu on depth and then rendering view-dependent textures through NeRF. Specifically, we propose a novel network, named SRONet, for this hybrid representation. SRONet can handle unseen performers without fine-tuning. Besides, a neural blending-based ray interpolation approach, a tree-based voxel-denoising scheme, and a parallel computing pipeline are incorporated to reconstruct and render live free-view videos at 10 fps on average. To evaluate the rendering performance, we construct a real-captured RGBD benchmark from 40 performers. Experimental results show that SAILOR outperforms existing human reconstruction and performance capture methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SAILOR:协同辐射场和占位场捕捉真人表演
现场VR/AR表演的沉浸式用户体验需要表演者的快速准确的自由视图渲染。现有的方法主要基于像素对齐隐式函数(PIFu)或神经辐射场(NeRF)。然而,虽然基于pifu的方法通常无法产生逼真的视图依赖纹理,但基于nerf的方法通常缺乏局部几何精度并且计算量很大(例如,3D点的密集采样,额外的微调或姿态估计)。在这项工作中,我们提出了一种新的可推广方法,名为SAILOR,从非常稀疏的RGBD直播流中创建高质量的人类自由观看视频。为了产生依赖于视图的纹理,同时保持局部精确的几何形状,我们整合了PIFu和NeRF,使它们协同工作,通过调节PIFu的深度,然后通过NeRF渲染依赖于视图的纹理。具体来说,我们提出了一个新的网络,命名为SRONet,用于这种混合表示。SRONet无需微调就可以处理看不见的表演者。此外,结合基于神经混合的光线插值方法、基于树的体素去噪方案和并行计算管道,以平均10fps的速度重建和渲染实时自由视频。为了评估渲染性能,我们从40个表演者中构建了一个实时捕获的RGBD基准。实验结果表明,该方法优于现有的人体重建和性能捕获方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GeoLatent: A Geometric Approach to Latent Space Design for Deformable Shape Generators An Implicit Neural Representation for the Image Stack: Depth, All in Focus, and High Dynamic Range Rectifying Strip Patterns From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans Warped-Area Reparameterization of Differential Path Integrals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1