ACM Transactions on Graphics最新文献_第9页

Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022 在大规模公开挑战赛中评估手势生成：2022 年 GENEA 挑战赛

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-27 DOI: 10.1145/3656374

Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field.

The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around (-0.5). Based on the challenge results we formulate numerous recommendations for system building and evaluation.

本文报告了第二届 GENEA 挑战赛的情况，该挑战赛旨在对数据驱动的自动协同语音手势生成进行基准测试。参赛团队使用相同的语音和动作数据集构建手势生成系统。所有这些系统生成的动作都使用标准化的可视化管道渲染成视频，并在几个大型的众包用户研究中进行评估。与比较不同的研究论文不同，这里的结果差异只是由于方法的不同，因此可以直接比较不同的系统。数据集基于 18 个小时的全身动作捕捉，包括手指，捕捉的对象是正在进行二人对话的不同人。十支团队参加了两个级别的挑战赛：全身和上半身手势。对于每个级别，我们既要评估手势动作与人类的相似性，又要评估其是否适合特定的语音信号。我们的评估将与人类的相似性和手势的适当性分离开来，这一直是该领域的一个难题。评估结果表明，某些合成手势比三维人体动作捕捉更像人。据我们所知，这种情况以前从未出现过。另一方面，我们发现所有的合成动作都远不如原始动作捕捉记录更适合语音。我们还发现，在这次大规模的评估中，传统的客观指标与主观的人类相似度评级并没有很好的相关性。唯一的例外是弗雷谢特手势距离（FGD），它的 Kendall's tau 等级相关性约为(-0.5)。基于挑战结果，我们为系统建设和评估提出了许多建议。

{"title":"Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022","authors":"Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter","doi":"10.1145/3656374","DOIUrl":"https://doi.org/10.1145/3656374","url":null,"abstract":"This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around (-0.5). Based on the challenge results we formulate numerous recommendations for system building and evaluation.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"9 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentiable solver for time-dependent deformation problems with contact 随时间变化的接触变形问题的可微分求解器

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-26 DOI: 10.1145/3657648

Zizhou Huang, Davi Colli Tozoni, Arvi Gjoka, Zachary Ferguson, Teseo Schneider, Daniele Panozzo, Denis Zorin

We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code.

We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations.

我们为具有接触和摩擦力的随时间变化的变形问题引入了一种通用可微分求解器。我们的方法使用带有高阶时间积分器的有限元离散化，并结合最近提出的增量势接触法来处理接触力和摩擦力，从而解决复杂几何场景中的 ODE 和 PDE 受限优化问题。它支持静态和动态问题，并对物理问题描述中涉及的所有物理参数（包括形状、材料参数、摩擦参数和初始条件）进行微分。我们通过分析推导出的积分公式效率很高，与正向模拟相比开销很小（对于非线性问题通常小于 10%），并且与正向问题有很多相似之处，因此可以重复使用现有正向模拟器的大部分代码。我们在开源 PolyFEM 库的基础上实现了我们的方法，并在模拟结果和物理验证中演示了我们的求解器在形状设计、初始条件优化和材料估算方面的适用性。

{"title":"Differentiable solver for time-dependent deformation problems with contact","authors":"Zizhou Huang, Davi Colli Tozoni, Arvi Gjoka, Zachary Ferguson, Teseo Schneider, Daniele Panozzo, Denis Zorin","doi":"10.1145/3657648","DOIUrl":"https://doi.org/10.1145/3657648","url":null,"abstract":"We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code. We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"8 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-Time Neural Appearance Models 实时神经外观模型

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-20 DOI: 10.1145/3659577

Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn

We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations.

Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation.

By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.

我们介绍了一套完整的系统，用于实时渲染以前只能离线使用的具有复杂外观的场景。这是通过算法和系统级创新的结合实现的。我们的外观模型利用学习到的分层纹理，通过神经解码器进行解释，产生反射值和重要度采样方向。为了更好地利用解码器的建模能力，我们为解码器配备了两个图形先验。第一个先验--将方向转换为学习到的阴影帧--有助于准确重建中尺度效应。第二个先验--微面采样分布--允许神经解码器高效执行重要性采样。由此产生的外观模型支持各向异性采样和细节层次渲染，并能将深层次的材料图烘焙成紧凑统一的神经表示。通过将硬件加速的张量运算暴露给光线追踪着色器，我们展示了在实时路径追踪器中高效内联和执行神经解码器的可能性。我们分析了神经材料数量增加时的可扩展性，并建议使用针对一致性和发散性执行进行优化的代码来提高性能。我们的神经材料着色器比非神经分层材料快一个数量级以上。这为在游戏和实时预览等实时应用中使用电影级视觉效果打开了大门。

{"title":"Real-Time Neural Appearance Models","authors":"Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn","doi":"10.1145/3659577","DOIUrl":"https://doi.org/10.1145/3659577","url":null,"abstract":"We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations. Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation. By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"16 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints ConceptLab：使用 VLM 引导的扩散先验约束生成创意概念

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-16 DOI: 10.1145/3659578

Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or

Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative text-to-image generation, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of “prior constraints”. To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.

最近的文本到图像生成模型使我们能够将文字转化为生动迷人的图像。随之而来的个性化技术也让我们能够在新的场景中想象独特的概念。然而，一个耐人寻味的问题依然存在：我们如何才能生成一个从未见过的新的想象概念？在本文中，我们提出了从文本到图像的创造性生成任务，在此任务中，我们试图生成一个大类中的新成员（例如，生成一个不同于所有现有宠物的宠物）。我们利用研究不足的扩散先验模型，证明创意生成问题可以表述为扩散先验输出空间的优化过程，从而产生一组 "先验约束"。为了使我们生成的概念不趋同于现有成员，我们加入了一个能回答问题的视觉语言模型（VLM），它能自适应地为优化问题添加新的约束条件，从而鼓励模型发现越来越多的独特创意。最后，我们展示了我们的先验约束也可以作为一种强大的混合机制，让我们能够在生成的概念之间创建混合体，从而为创造过程引入更大的灵活性。

{"title":"ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints","authors":"Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or","doi":"10.1145/3659578","DOIUrl":"https://doi.org/10.1145/3659578","url":null,"abstract":"Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative text-to-image generation, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of “prior constraints”. To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"25 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140557143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DMHomo: Learning Homography with Diffusion Models DMHomo：利用扩散模型学习同构模型

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-03-11 DOI: 10.1145/3652207

Haipeng Li, Hai Jiang, Ao Luo, Ping Tan, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

Supervised homography estimation methods face a challenge due to the lack of adequate labeled training data. To address this issue, we propose DMHomo, a diffusion model-based framework for supervised homography learning. This framework generates image pairs with accurate labels, realistic image content, and realistic interval motion, ensuring they satisfy adequate pairs. We utilize unlabeled image pairs with pseudo-labels such as homography and dominant plane masks, computed from existing methods, to train a diffusion model that generates a supervised training dataset. To further enhance performance, we introduce a new probabilistic mask loss, which identifies outlier regions through supervised training, and an iterative mechanism to optimize the generative and homography models successively. Our experimental results demonstrate that DMHomo effectively overcomes the scarcity of qualified datasets in supervised homography learning and improves generalization to real-world scenes. The code and dataset are available at: https://github.com/lhaippp/DMHomo

由于缺乏足够的标记训练数据，有监督的同源性估计方法面临着挑战。为了解决这个问题，我们提出了基于扩散模型的监督同源性学习框架 DMHomo。该框架生成的图像对具有准确的标签、逼真的图像内容和逼真的间隔运动，确保它们满足充分的图像对要求。我们利用从现有方法中计算出的带有伪标签（如同构图和优势平面掩码）的无标签图像对来训练扩散模型，从而生成一个有监督的训练数据集。为了进一步提高性能，我们引入了一种新的概率掩码损失（通过监督训练识别离群区域）和一种迭代机制，以连续优化生成模型和同构模型。实验结果表明，DMHomo 有效克服了监督同构学习中合格数据集稀缺的问题，并提高了对真实场景的泛化能力。代码和数据集可在以下网址获取： https://github.com/lhaippp/DMHomo

引用次数: 0

Joint Stroke Tracing and Correspondence for 2D Animation 二维动画的联合描边与对应

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-29 DOI: 10.1145/3649890

Haoran Mo, Chengying Gao, Ruomei Wang

To alleviate human labor in redrawing keyframes with ordered vector strokes for automatic inbetweening, we for the first time propose a joint stroke tracing and correspondence approach. Given consecutive raster keyframes along with a single vector image of the starting frame as a guidance, the approach generates vector drawings for the remaining keyframes while ensuring one-to-one stroke correspondence. Our framework trained on clean line drawings generalizes to rough sketches and the generated results can be imported into inbetweening systems to produce inbetween sequences. Hence, the method is compatible with standard 2D animation workflow. An adaptive spatial transformation module (ASTM) is introduced to handle non-rigid motions and stroke distortion. We collect a dataset for training, with 10k+ pairs of raster frames and their vector drawings with stroke correspondence. Comprehensive validations on real clean and rough animated frames manifest the effectiveness of our method and superiority to existing methods.

为了减轻用有序的矢量笔画重新绘制关键帧以实现自动夹帧的人力劳动，我们首次提出了一种联合笔画追踪和对应方法。该方法以连续的光栅关键帧和起始帧的单个矢量图像为指导，为剩余的关键帧生成矢量图，同时确保一一对应的笔画。我们在简洁线条图上训练的框架可通用于粗略草图，生成的结果可导入中间系统以生成中间序列。因此，该方法与标准的二维动画工作流程兼容。我们引入了自适应空间转换模块（ASTM）来处理非刚性运动和笔触变形。我们收集了一个用于训练的数据集，其中包含 10k+ 对光栅帧及其矢量图与笔画的对应关系。在真实的干净和粗糙的动画帧上进行的全面验证证明了我们方法的有效性以及优于现有方法的优势。

引用次数: 0

A Dual-Particle Approach for Incompressible SPH Fluids 不可压缩 SPH 流体的双粒子方法

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-29 DOI: 10.1145/3649888

Shusen Liu, Xiaowei He, Yuzhong Guo, Yue Chang, Wencheng Wang

Tensile instability is one of the major obstacles to particle methods in fluid simulation, which would cause particles to clump in pairs under tension and prevent fluid simulation to generate small-scale thin features. To address this issue, previous particle methods either use a background pressure or a finite difference scheme to alleviate the particle clustering artifacts, yet still fail to produce small-scale thin features in free-surface flows. In this paper, we propose a dual-particle approach for simulating incompressible fluids. Our approach involves incorporating supplementary virtual particles designed to capture and store particle pressures. These pressure samples undergo systematic redistribution at each time step, grounded in the initial positions of the fluid particles. By doing so, we effectively reduce tensile instability in standard SPH by narrowing down the unstable regions for particles experiencing tensile stress. As a result, we can accurately simulate free-surface flows with rich small-scale thin features, such as droplets, streamlines, and sheets, as demonstrated by experimental results.

拉伸不稳定性是粒子方法在流体模拟中的主要障碍之一，它会导致粒子在拉伸作用下成对聚集，使流体模拟无法产生小尺度薄特征。为了解决这个问题，以往的粒子方法要么使用背景压力，要么使用有限差分方案来缓解粒子成团的假象，但仍然无法在自由表面流中生成小尺度薄特征。在本文中，我们提出了一种模拟不可压缩流体的双粒子方法。我们的方法包括加入旨在捕捉和存储颗粒压力的辅助虚拟颗粒。这些压力样本在每个时间步进行系统的重新分配，并以流体粒子的初始位置为基础。通过这种方法，我们有效地减少了标准 SPH 中的拉伸不稳定性，缩小了经历拉伸应力的粒子的不稳定区域。因此，我们可以精确地模拟自由表面流体，其具有丰富的小尺度薄特征，如液滴、流线和薄片，实验结果也证明了这一点。

引用次数: 0

HQ3DAvatar: High Quality Implicit 3D Head Avatar HQ3DAvatar：高质量隐式 3D 头像

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-29 DOI: 10.1145/3649889

Kartik Teotia, Mallikarjun B R, Xingang Pan, Hyeongwoo Kim, Pablo Garrido, Mohamed Elgharib, Christian Theobalt

Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for a resolution of 480x270. Our method outperforms related approaches both visually and numerically. We will release our multiple-identity dataset to encourage further research.

最近，多视角体积渲染技术在建模和合成高质量头部虚拟形象方面显示出巨大潜力。捕捉完整头部动态表现的常见方法是使用基于网格的模板或基于三维立方体的图形基元来跟踪底层几何体。虽然这些基于模型的方法取得了可喜的成果，但它们往往无法学习复杂的几何细节，如嘴巴内部、头发和随时间发生的拓扑变化。本文介绍了一种构建高度逼真的数字头像的新方法。我们的方法通过神经网络参数化的隐式函数来学习典型空间。它利用所学特征空间中的多分辨率哈希编码，实现了高质量、更快的训练和高分辨率渲染。测试时，我们的方法由单目 RGB 视频驱动。在这里，图像编码器会提取特定的面部特征，这些特征也是可学习的典型空间的条件。这样就能在训练过程中鼓励随形变而变化的纹理。我们还提出了一种新颖的基于光流的损耗，它能确保所学典型空间中的对应关系，从而促进无伪影和时间一致的渲染。我们展示了具有挑战性的面部表情结果，并以交互式实时速率展示了分辨率为 480x270 的自由视点渲染。我们的方法在视觉和数值上都优于相关方法。我们将发布我们的多重身份数据集，以鼓励进一步的研究。

{"title":"HQ3DAvatar: High Quality Implicit 3D Head Avatar","authors":"Kartik Teotia, Mallikarjun B R, Xingang Pan, Hyeongwoo Kim, Pablo Garrido, Mohamed Elgharib, Christian Theobalt","doi":"10.1145/3649889","DOIUrl":"https://doi.org/10.1145/3649889","url":null,"abstract":"Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full head dynamic performances is to track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail to learn complex geometric details such as the mouth interior, hair, and topological changes over time. This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. It leverages multiresolution hash encoding in the learned feature space, allowing for high-quality, faster training and high-resolution rendering. At test time, our method is driven by a monocular RGB video. Here, an image encoder extracts face-specific features that also condition the learnable canonical space. This encourages deformation-dependent texture variations during training. We also propose a novel optical flow based loss that ensures correspondences in the learned canonical space, thus encouraging artifact-free and temporally consistent renderings. We show results on challenging facial expressions and show free-viewpoint renderings at interactive real-time rates for a resolution of 480x270. Our method outperforms related approaches both visually and numerically. We will release our multiple-identity dataset to encourage further research.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"15 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140001070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online Neural Path Guiding with Normalized Anisotropic Spherical Gaussians 用归一化各向异性球形高斯进行在线神经路径引导

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-28 DOI: 10.1145/3649310

Jiawei Huang, Akito Iizuka, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura

Importance sampling techniques significantly reduce variance in physically-based rendering. In this paper we propose a novel online framework to learn the spatial-varying distribution of the full product of the rendering equation, with a single small neural network using stochastic ray samples. The learned distributions can be used to efficiently sample the full product of incident light. To accomplish this, we introduce a novel closed-form density model, called the Normalized Anisotropic Spherical Gaussian mixture, that can model a complex light field with a small number of parameters and that can be directly sampled. Our framework progressively renders and learns the distribution, without requiring any warm-up phases. With the compact and expressive representation of our density model, our framework can be implemented entirely on the GPU, allowing it to produce high-quality images with limited computational resources. The results show that our framework outperforms existing neural path guiding approaches and achieves comparable or even better performance than state-of-the-art online statistical path guiding techniques.

重要性采样技术能显著减少基于物理的渲染中的差异。在本文中，我们提出了一个新颖的在线框架，利用随机光线采样，通过单个小型神经网络学习渲染方程全乘积的空间变化分布。学习到的分布可用于高效采样入射光的全积。为了实现这一目标，我们引入了一种新颖的闭式密度模型，称为归一化各向异性球形高斯混合物，它可以用少量参数对复杂光场进行建模，并可直接采样。我们的框架可以逐步渲染和学习该分布，无需任何预热阶段。由于我们的密度模型结构紧凑、表现力强，因此我们的框架可以完全在 GPU 上实现，从而可以利用有限的计算资源生成高质量的图像。结果表明，我们的框架优于现有的神经路径引导方法，其性能与最先进的在线统计路径引导技术相当，甚至更好。

{"title":"Online Neural Path Guiding with Normalized Anisotropic Spherical Gaussians","authors":"Jiawei Huang, Akito Iizuka, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura","doi":"10.1145/3649310","DOIUrl":"https://doi.org/10.1145/3649310","url":null,"abstract":"Importance sampling techniques significantly reduce variance in physically-based rendering. In this paper we propose a novel online framework to learn the spatial-varying distribution of the full product of the rendering equation, with a single small neural network using stochastic ray samples. The learned distributions can be used to efficiently sample the full product of incident light. To accomplish this, we introduce a novel closed-form density model, called the Normalized Anisotropic Spherical Gaussian mixture, that can model a complex light field with a small number of parameters and that can be directly sampled. Our framework progressively renders and learns the distribution, without requiring any warm-up phases. With the compact and expressive representation of our density model, our framework can be implemented entirely on the GPU, allowing it to produce high-quality images with limited computational resources. The results show that our framework outperforms existing neural path guiding approaches and achieves comparable or even better performance than state-of-the-art online statistical path guiding techniques.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"27 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139994036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Importance Sampling BRDF Derivatives 重要度采样 BRDF 衍生物

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-21 DOI: 10.1145/3648611

Yash Belhe, Bing Xu, Sai Praveen Bangaru, Ravi Ramamoorthi, Tzu-Mao Li

We propose a set of techniques to efficiently importance sample the derivatives of a wide range of BRDF models. In differentiable rendering, BRDFs are replaced by their differential BRDF counterparts which are real-valued and can have negative values. This leads to a new source of variance arising from their change in sign. Real-valued functions cannot be perfectly importance sampled by a positive-valued PDF, and the direct application of BRDF sampling leads to high variance. Previous attempts at antithetic sampling only addressed the derivative with the roughness parameter of isotropic microfacet BRDFs. Our work generalizes BRDF derivative sampling to anisotropic microfacet models, mixture BRDFs, Oren-Nayar, Hanrahan-Krueger, among other analytic BRDFs.

Our method first decomposes the real-valued differential BRDF into a sum of single-signed functions, eliminating variance from a change in sign. Next, we importance sample each of the resulting single-signed functions separately. The first decomposition, positivization, partitions the real-valued function based on its sign, and is effective at variance reduction when applicable. However, it requires analytic knowledge of the roots of the differential BRDF, and for it to be analytically integrable too. Our key insight is that the single-signed functions can have overlapping support, which significantly broadens the ways we can decompose a real-valued function. Our product and mixture decompositions exploit this property, and they allow us to support several BRDF derivatives that positivization could not handle. For a wide variety of BRDF derivatives, our method significantly reduces the variance (up to 58x in some cases) at equal computation cost and enables better recovery of spatially varying textures through gradient-descent-based inverse rendering.

我们提出了一套技术，可以有效地对各种 BRDF 模型的导数进行重要采样。在可微分渲染中，BRDF 被其对应的微分 BRDF 所取代，微分 BRDF 是实值，可以有负值。这就导致了因符号变化而产生的新的方差源。实值函数无法通过正值 PDF 得到完美的重要性采样，直接应用 BRDF 采样会导致高方差。之前的反义采样尝试只解决了各向同性微面 BRDF 的粗糙度参数导数问题。我们的工作将 BRDF 导数采样推广到各向异性 microfacet 模型、混合 BRDF、Oren-Nayar、Hanrahan-Krueger 以及其他解析 BRDF。我们的方法首先将实值微分 BRDF 分解成单符号函数之和，消除符号变化带来的方差。接下来，我们对得到的每个单符号函数分别进行重要采样。第一种分解方法（正化）是根据实值函数的符号对其进行分割，在适用情况下可有效减少方差。然而，这需要对微分 BRDF 的根进行分析，而且还需要对其进行分析积分。我们的主要见解是，单符号函数可以有重叠支持，这大大拓宽了我们分解实值函数的方法。我们的乘积分解和混合分解利用了这一特性，使我们能够支持正化法无法处理的多种 BRDF 导数。对于各种 BRDF 衍生物，我们的方法在计算成本相同的情况下显著降低了方差（在某些情况下高达 58 倍），并能通过基于梯度诱导的反渲染更好地恢复空间变化的纹理。

{"title":"Importance Sampling BRDF Derivatives","authors":"Yash Belhe, Bing Xu, Sai Praveen Bangaru, Ravi Ramamoorthi, Tzu-Mao Li","doi":"10.1145/3648611","DOIUrl":"https://doi.org/10.1145/3648611","url":null,"abstract":"We propose a set of techniques to efficiently importance sample the derivatives of a wide range of BRDF models. In differentiable rendering, BRDFs are replaced by their differential BRDF counterparts which are real-valued and can have negative values. This leads to a new source of variance arising from their change in sign. Real-valued functions cannot be perfectly importance sampled by a positive-valued PDF, and the direct application of BRDF sampling leads to high variance. Previous attempts at antithetic sampling only addressed the derivative with the roughness parameter of isotropic microfacet BRDFs. Our work generalizes BRDF derivative sampling to anisotropic microfacet models, mixture BRDFs, Oren-Nayar, Hanrahan-Krueger, among other analytic BRDFs. Our method first decomposes the real-valued differential BRDF into a sum of single-signed functions, eliminating variance from a change in sign. Next, we importance sample each of the resulting single-signed functions separately. The first decomposition, positivization, partitions the real-valued function based on its sign, and is effective at variance reduction when applicable. However, it requires analytic knowledge of the roots of the differential BRDF, and for it to be analytically integrable too. Our key insight is that the single-signed functions can have overlapping support, which significantly broadens the ways we can decompose a real-valued function. Our product and mixture decompositions exploit this property, and they allow us to support several BRDF derivatives that positivization could not handle. For a wide variety of BRDF derivatives, our method significantly reduces the variance (up to 58x in some cases) at equal computation cost and enables better recovery of spatially varying textures through gradient-descent-based inverse rendering.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"80 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139915949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0