HQ3DAvatar:高质量隐式 3D 头像

IF 7.8 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Graphics Pub Date : 2024-02-29 DOI:10.1145/3649889
Kartik Teotia, Mallikarjun B R, Xingang Pan, Hyeongwoo Kim, Pablo Garrido, Mohamed Elgharib, Christian Theobalt
{"title":"HQ3DAvatar:高质量隐式 3D 头像","authors":"Kartik Teotia, Mallikarjun B R, Xingang Pan, Hyeongwoo Kim, Pablo Garrido, Mohamed Elgharib, Christian Theobalt","doi":"10.1145/3649889","DOIUrl":null,"url":null,"abstract":"<p>Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for a resolution of 480<i>x</i>270. Our method outperforms related approaches both visually and numerically. We will release our multiple-identity dataset to encourage further research.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HQ3DAvatar: High Quality Implicit 3D Head Avatar\",\"authors\":\"Kartik Teotia, Mallikarjun B R, Xingang Pan, Hyeongwoo Kim, Pablo Garrido, Mohamed Elgharib, Christian Theobalt\",\"doi\":\"10.1145/3649889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for a resolution of 480<i>x</i>270. Our method outperforms related approaches both visually and numerically. We will release our multiple-identity dataset to encourage further research.</p>\",\"PeriodicalId\":50913,\"journal\":{\"name\":\"ACM Transactions on Graphics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Graphics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3649889\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3649889","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

最近,多视角体积渲染技术在建模和合成高质量头部虚拟形象方面显示出巨大潜力。捕捉完整头部动态表现的常见方法是使用基于网格的模板或基于三维立方体的图形基元来跟踪底层几何体。虽然这些基于模型的方法取得了可喜的成果,但它们往往无法学习复杂的几何细节,如嘴巴内部、头发和随时间发生的拓扑变化。本文介绍了一种构建高度逼真的数字头像的新方法。我们的方法通过神经网络参数化的隐式函数来学习典型空间。它利用所学特征空间中的多分辨率哈希编码,实现了高质量、更快的训练和高分辨率渲染。测试时,我们的方法由单目 RGB 视频驱动。在这里,图像编码器会提取特定的面部特征,这些特征也是可学习的典型空间的条件。这样就能在训练过程中鼓励随形变而变化的纹理。我们还提出了一种新颖的基于光流的损耗,它能确保所学典型空间中的对应关系,从而促进无伪影和时间一致的渲染。我们展示了具有挑战性的面部表情结果,并以交互式实时速率展示了分辨率为 480x270 的自由视点渲染。我们的方法在视觉和数值上都优于相关方法。我们将发布我们的多重身份数据集,以鼓励进一步的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HQ3DAvatar: High Quality Implicit 3D Head Avatar

Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for a resolution of 480x270. Our method outperforms related approaches both visually and numerically. We will release our multiple-identity dataset to encourage further research.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Graphics
ACM Transactions on Graphics 工程技术-计算机:软件工程
CiteScore
14.30
自引率
25.80%
发文量
193
审稿时长
12 months
期刊介绍: ACM Transactions on Graphics (TOG) is a peer-reviewed scientific journal that aims to disseminate the latest findings of note in the field of computer graphics. It has been published since 1982 by the Association for Computing Machinery. Starting in 2003, all papers accepted for presentation at the annual SIGGRAPH conference are printed in a special summer issue of the journal.
期刊最新文献
PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly Synchronized tracing of primitive-based implicit volumes TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis DAMO: A Deep Solver for Arbitrary Marker Configuration in Optical Motion Capture RNA: Relightable Neural Assets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1