首页 > 最新文献

ACM Transactions on Graphics最新文献

英文 中文
PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly PhysFiT:物理感知三维形状理解,用于完成不完整装配
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-29 DOI: 10.1145/3702226
Weihao Wang, Mingyu You, Hongjun Zhou, Bin He
Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, i.e. , connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.
了解三维形状的零件组成和结构对于广泛的三维应用(包括三维零件装配和三维装配完成)至关重要。与三维零件装配相比,三维装配完成更为复杂,涉及使用工具包修复缺失多个零件的破损或不完整家具。如何揭示潜在的零件关系,从具有相似几何形状的多个无法区分的候选零件中推断出缺失的零件,并完成连接良好、结构稳定且美观的装配,一直是首要挑战。这项任务不仅需要有关零件组成的专业知识,更重要的是要了解物理约束条件,即连接性、稳定性和对称性。忽视这些约束条件往往会导致组装结果虽然在视觉上看似合理,但却不切实际。为了应对这一挑战,我们提出了物理感知三维形状理解框架 PhysFiT。该框架建立在基于注意力的零件关系建模基础上,并结合了连接建模、无模拟稳定性优化和对称变换一致性。我们评估了该框架在三维零件装配和三维装配完成(这是本研究中提出的一项新的装配任务)方面的功效。大量实验证明,PhysFiT 在构建几何上合理、物理上符合要求的装配方面非常有效。
{"title":"PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly","authors":"Weihao Wang, Mingyu You, Hongjun Zhou, Bin He","doi":"10.1145/3702226","DOIUrl":"https://doi.org/10.1145/3702226","url":null,"abstract":"Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, <jats:italic>i.e.</jats:italic> , connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synchronized tracing of primitive-based implicit volumes 基于基元的隐式卷的同步追踪
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-28 DOI: 10.1145/3702227
Cédric Zanni
Implicit volumes are known for their ability to represent smooth shapes of arbitrary topology thanks to hierarchical combinations of primitives using a structure called a blobtree. We present a new tile-based rendering pipeline well suited for modeling scenarios, i.e., no preprocessing is required when primitive parameters are updated. When using approximate signed distance fields (fields with Lipschitz bound close to 1), we rely on compact, smooth CSG operators - extended from standard bounded operators - to compute a tight augmented bounding volume for all primitives of the blobtree. The pipeline relies on a low-resolution A-buffer storing the primitives of interest of a given screen tile. The A-buffer is then used during ray processing to synchronize threads within a subfrustum. This allows coherent field evaluation within workgroups. We use a sparse bottom-up tree traversal to prune the blobtree on-the-fly which allows us to decorrelate field evaluation complexity from the full blobtree size. The ray processing itself is done using the sphere tracing algorithm. The pipeline scales well to volumes consisting of thousands of primitives.
众所周知,隐含体能够表现任意拓扑结构的平滑形状,这要归功于使用一种叫做 blobtree 的结构对基元进行分层组合。我们提出了一种新的基于瓦片的渲染管道,非常适合建模场景,即在更新基元参数时无需进行预处理。在使用近似带符号距离场(Lipschitz 边界接近 1 的场)时,我们依靠紧凑、平滑的 CSG 算子(从标准有界算子扩展而来),为 blobtree 的所有基元计算紧密的增强边界体积。该流水线依靠低分辨率的 A 型缓冲区来存储给定屏幕磁贴中感兴趣的基元。然后在光线处理过程中使用 A 缓冲区来同步子信道内的线程。这样就能在工作组内进行连贯的实地评估。我们使用自下而上的稀疏树遍历来即时修剪 Blobtree,这样就能将字段评估复杂度与整个 Blobtree 的大小区分开来。射线处理本身是通过球体追踪算法完成的。该管道可以很好地扩展到由数千个基元组成的体积。
{"title":"Synchronized tracing of primitive-based implicit volumes","authors":"Cédric Zanni","doi":"10.1145/3702227","DOIUrl":"https://doi.org/10.1145/3702227","url":null,"abstract":"Implicit volumes are known for their ability to represent smooth shapes of arbitrary topology thanks to hierarchical combinations of primitives using a structure called a blobtree. We present a new tile-based rendering pipeline well suited for modeling scenarios, i.e., no preprocessing is required when primitive parameters are updated. When using approximate signed distance fields (fields with Lipschitz bound close to 1), we rely on compact, smooth CSG operators - extended from standard bounded operators - to compute a tight augmented bounding volume for all primitives of the blobtree. The pipeline relies on a low-resolution A-buffer storing the primitives of interest of a given screen tile. The A-buffer is then used during ray processing to synchronize threads within a subfrustum. This allows coherent field evaluation within workgroups. We use a sparse bottom-up tree traversal to prune the blobtree on-the-fly which allows us to decorrelate field evaluation complexity from the full blobtree size. The ray processing itself is done using the sphere tracing algorithm. The pipeline scales well to volumes consisting of thousands of primitives.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis TriHuman:用于详细人体几何和外观合成的实时可控三平面表示法
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-24 DOI: 10.1145/3697140
Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann
Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose TriHuman a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.
仅从视频数据创建可控的、逼真的和几何细节丰富的真人数字替身是计算机图形学和视觉领域的一项关键挑战,尤其是在要求实时性能的情况下。最近的方法将神经辐射场(NeRF)附加到铰接结构(如人体模型或骨架)上,将点映射到姿势规范空间,同时将神经辐射场调节到骨架姿势上。这些方法通常使用多层感知器(MLP)对神经场进行参数化,因此运行速度较慢。为了解决这一缺点,我们提出了 TriHuman,这是一种新颖的、适合人体的、可变形的、高效的三平面表示法,可实现实时性能、最先进的姿势可控几何合成以及逼真的渲染质量。其核心是,我们将全局光线样本非刚性地扭曲到未变形的三平面纹理空间中,从而有效解决了全局点映射到相同三平面位置的问题。然后,我们展示了这种三平面特征表示如何以骨骼运动为条件,以考虑动态外观和几何变化。我们的结果表明,在人类的几何和外观建模以及运行时性能方面,我们向更高质量迈出了明显的一步。
{"title":"TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis","authors":"Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann","doi":"10.1145/3697140","DOIUrl":"https://doi.org/10.1145/3697140","url":null,"abstract":"Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose <jats:italic>TriHuman</jats:italic> a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAMO: A Deep Solver for Arbitrary Marker Configuration in Optical Motion Capture DAMO:光学运动捕捉中任意标记配置的深度求解器
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-14 DOI: 10.1145/3695865
KyeongMin Kim, SeungWon Seo, DongHeun Han, HyeongYeop Kang
Marker-based optical motion capture (mocap) systems are increasingly utilized for acquiring 3D human motion, offering advantages in capturing the subtle nuances of human movement, style consistency, and ease of obtaining desired motion. Motion data acquisition via mocap typically requires laborious marker labeling and motion reconstruction, recent deep-learning solutions have aimed to automate the process. However, such solutions generally presuppose a fixed marker configuration to reduce learning complexity, thereby limiting flexibility. To overcome the limitation, we introduce DAMO, an end-to-end deep solver, proficiently inferring arbitrary marker configurations and optimizing pose reconstruction. DAMO outperforms state-of-the-art like SOMA and MoCap-Solver in scenarios with significant noise and unknown marker configurations. We expect that DAMO will meet various practical demands such as facilitating dynamic marker configuration adjustments during capture sessions, processing marker clouds irrespective of whether they employ mixed or entirely unknown marker configurations, and allowing custom marker configurations to suit distinct capture scenarios. DAMO code and pretrained models are available at https://github.com/CritBear/damo .
基于标记的光学动作捕捉(mocap)系统越来越多地被用于获取三维人体动作,它在捕捉人体动作的细微差别、风格一致性和轻松获取所需动作方面具有优势。通过 mocap 采集运动数据通常需要进行费力的标记标注和运动重建,而最近的深度学习解决方案旨在实现这一过程的自动化。然而,这些解决方案一般都以固定的标记配置为前提,以降低学习的复杂性,从而限制了灵活性。为了克服这一限制,我们引入了端到端深度求解器 DAMO,它能熟练推断任意标记配置并优化姿态重建。在存在大量噪声和未知标记配置的情况下,DAMO 的表现优于 SOMA 和 MoCap-Solver 等最先进的解算器。我们希望 DAMO 能够满足各种实际需求,例如在捕捉过程中促进动态标记配置调整、处理标记云(无论其是否采用混合或完全未知的标记配置)以及允许自定义标记配置以适应不同的捕捉场景。DAMO 代码和预训练模型可在 https://github.com/CritBear/damo 上获取。
{"title":"DAMO: A Deep Solver for Arbitrary Marker Configuration in Optical Motion Capture","authors":"KyeongMin Kim, SeungWon Seo, DongHeun Han, HyeongYeop Kang","doi":"10.1145/3695865","DOIUrl":"https://doi.org/10.1145/3695865","url":null,"abstract":"Marker-based optical motion capture (mocap) systems are increasingly utilized for acquiring 3D human motion, offering advantages in capturing the subtle nuances of human movement, style consistency, and ease of obtaining desired motion. Motion data acquisition via mocap typically requires laborious marker labeling and motion reconstruction, recent deep-learning solutions have aimed to automate the process. However, such solutions generally presuppose a fixed marker configuration to reduce learning complexity, thereby limiting flexibility. To overcome the limitation, we introduce DAMO, an end-to-end deep solver, proficiently inferring arbitrary marker configurations and optimizing pose reconstruction. DAMO outperforms state-of-the-art like SOMA and MoCap-Solver in scenarios with significant noise and unknown marker configurations. We expect that DAMO will meet various practical demands such as facilitating dynamic marker configuration adjustments during capture sessions, processing marker clouds irrespective of whether they employ mixed or entirely unknown marker configurations, and allowing custom marker configurations to suit distinct capture scenarios. DAMO code and pretrained models are available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/CritBear/damo\">https://github.com/CritBear/damo</jats:ext-link> .","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA: Relightable Neural Assets RNA:可重燃的神经资产
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-13 DOI: 10.1145/3695866
Krishna Mullia, Fujun Luan, Xin Sun, Miloš Hašan
High-fidelity 3D assets with materials composed of fibers (including hair), complex layered material shaders, or fine scattering geometry are critical in high-end realistic rendering applications. Rendering such models is computationally expensive due to heavy shaders and long scattering paths. Moreover, implementing the shading and scattering models is non-trivial and has to be done not only in the 3D content authoring software (which is necessarily complex), but also in all downstream rendering solutions. For example, web and mobile viewers for complex 3D assets are desirable, but frequently cannot support the full shading complexity allowed by the authoring application. Our goal is to design a neural representation for 3D assets with complex shading that supports full relightability and full integration into existing renderers. We provide an end-to-end shading solution at the first intersection of a ray with the underlying geometry. All shading and scattering is precomputed and included in the neural asset; no multiple scattering paths need to be traced, and no complex shading models need to be implemented to render our assets, beyond a single neural architecture. We combine an MLP decoder with a feature grid. Shading consists of querying a feature vector, followed by an MLP evaluation producing the final reflectance value. Our method provides high-fidelity shading, close to the ground-truth Monte Carlo estimate even at close-up views. We believe our neural assets could be used in practical renderers, providing significant speed-ups and simplifying renderer implementations.
在高端逼真渲染应用中,具有由纤维(包括头发)、复杂分层材质着色器或精细散射几何体组成的材质的高保真 3D 资产至关重要。由于着色器和散射路径较长,渲染这类模型的计算成本很高。此外,实现着色和散射模型并非易事,不仅需要在三维内容制作软件中完成(这必然很复杂),还需要在所有下游渲染解决方案中完成。例如,复杂三维资产的网络和移动浏览器固然可取,但往往无法支持创作应用程序所允许的全部着色复杂性。我们的目标是为具有复杂着色的三维资产设计一种神经表示法,它支持完全的可重照性,并能完全集成到现有的渲染器中。我们在光线与底层几何体的第一个交叉点提供端到端的着色解决方案。所有的着色和散射都是预先计算好的,并包含在神经资产中;除了单一的神经架构外,无需追踪多个散射路径,也无需实施复杂的着色模型来渲染我们的资产。我们将 MLP 解码器与特征网格相结合。着色包括查询特征向量,然后通过 MLP 评估得出最终反射值。我们的方法能提供高保真的阴影效果,即使在近距离观察时也能接近地面实况蒙特卡洛估计值。我们相信,我们的神经资产可用于实际的渲染器中,显著提高速度并简化渲染器的实现。
{"title":"RNA: Relightable Neural Assets","authors":"Krishna Mullia, Fujun Luan, Xin Sun, Miloš Hašan","doi":"10.1145/3695866","DOIUrl":"https://doi.org/10.1145/3695866","url":null,"abstract":"High-fidelity 3D assets with materials composed of fibers (including hair), complex layered material shaders, or fine scattering geometry are critical in high-end realistic rendering applications. Rendering such models is computationally expensive due to heavy shaders and long scattering paths. Moreover, implementing the shading and scattering models is non-trivial and has to be done not only in the 3D content authoring software (which is necessarily complex), but also in all downstream rendering solutions. For example, web and mobile viewers for complex 3D assets are desirable, but frequently cannot support the full shading complexity allowed by the authoring application. Our goal is to design a neural representation for 3D assets with complex shading that supports full relightability and full integration into existing renderers. We provide an end-to-end shading solution at the first intersection of a ray with the underlying geometry. All shading and scattering is precomputed and included in the neural asset; no multiple scattering paths need to be traced, and no complex shading models need to be implemented to render our assets, beyond a single neural architecture. We combine an MLP decoder with a feature grid. Shading consists of querying a feature vector, followed by an MLP evaluation producing the final reflectance value. Our method provides high-fidelity shading, close to the ground-truth Monte Carlo estimate even at close-up views. We believe our neural assets could be used in practical renderers, providing significant speed-ups and simplifying renderer implementations.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speed-Aware Audio-Driven Speech Animation using Adaptive Windows 使用自适应窗口的速度感知音频驱动语音动画
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-31 DOI: 10.1145/3691341
Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonho Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh
We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.
我们提出了一种新颖的方法,可以利用多个自适应窗口从音频生成逼真的三维人脸语音动画。与以往使用固定大小音频窗口的研究不同,我们的方法接受自适应音频窗口作为输入,反映音频说话速度,以使用一致的音位信息。我们的系统由三部分组成。首先,使用以自我监督方式训练的神经网络从输入音频中估计说话速度。其次,根据估算出的语速,自适应地预测出包含音频特征的适当窗口大小。另一个关键因素在于使用多个不同大小的音频窗口作为动画生成器的输入:小窗口集中于细节信息,大窗口则考虑中心帧附近的广泛音位信息。最后,由多个自适应音频窗口生成语音动画。我们的方法可以从任何语速的现场音频中生成逼真的语音动画,如快速说唱、慢速歌曲以及正常语音。我们通过大量定量和定性评估(包括用户研究)证明,我们的方法优于最先进的方法。
{"title":"Speed-Aware Audio-Driven Speech Animation using Adaptive Windows","authors":"Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonho Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh","doi":"10.1145/3691341","DOIUrl":"https://doi.org/10.1145/3691341","url":null,"abstract":"We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ControlMat: A Controlled Generative Approach to Material Capture 控制垫:材料捕捉的受控生成方法
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-27 DOI: 10.1145/3688830
Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur
Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials that could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space optimization methods, and we carefully validate our diffusion process design choices. 1
从照片中重建材料是三维内容创作民主化的关键组成部分。我们建议利用生成式深度网络的最新进展,将这一难以解决的问题表述为受控合成问题。我们提出的 ControlMat 是一种方法,它以一张光照不受控制的照片为输入,利用扩散模型生成可信的、可拼贴的、高分辨率的物理数字材料。我们仔细分析了扩散模型在多通道输出时的行为,调整了采样过程以融合多尺度信息,并引入了滚动扩散以实现高分辨率输出的平铺性和修补扩散。我们的生成方法还允许探索与输入图像相对应的各种材料,从而减轻未知照明条件的影响。我们的研究表明,我们的方法优于最新的推理和潜空间优化方法,我们还仔细验证了我们的扩散过程设计选择。 1
{"title":"ControlMat: A Controlled Generative Approach to Material Capture","authors":"Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur","doi":"10.1145/3688830","DOIUrl":"https://doi.org/10.1145/3688830","url":null,"abstract":"Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials that could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space optimization methods, and we carefully validate our diffusion process design choices. <jats:xref ref-type=\"fn\"> <jats:sup>1</jats:sup> </jats:xref>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resolving Collisions in Dense 3D Crowd Animations 解决密集 3D 人群动画中的碰撞问题
IF 7.8 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-10 DOI: 10.1145/3687266
Gonzalo Gomez-Nogales, Melania Prieto-Martin, Cristian Romero, Marc Comino-Trinidad, Pablo Ramon-Prieto, A. Olivier, Ludovic Hoyet, Miguel Otaduy, J. Pettré, Dan Casas
We propose a novel contact-aware method to synthesize highly-dense 3D crowds of animated characters. Existing methods animate crowds by, first, computing the 2D global motion approximating subjects as 2D particles and, then, introducing individual character motions without considering their surroundings. This creates the illusion of a 3D crowd, but, with density, characters frequently intersect each other since character-to-character contact is not modeled. We tackle this issue and propose a general method that considers any crowd animation and resolves existing residual collisions. To this end, we take a physics-based approach to model contacts between articulated characters. This enables the real-time synthesis of 3D high-density crowds with dozens of individuals that do not intersect each other, producing an unprecedented level of physical correctness in animations. Under the hood, we model each individual using a parametric human body incorporating a set of 3D proxies to approximate their volume. We then build a large system of articulated rigid bodies, and use an efficient physics-based approach to solve for individual body poses that do not collide with each other while maintaining the overall motion of the crowd. We first validate our approach objectively and quantitatively. We then explore relations between physical correctness and perceived realism based on an extensive user study that evaluates the relevance of solving contacts in dense crowds. Results demonstrate that our approach outperforms existing methods for crowd animation in terms of geometric accuracy and overall realism.
我们提出了一种新颖的接触感知方法,用于合成高密度的三维动画人物群。现有的人群动画制作方法首先是计算二维全局运动,将主体近似为二维粒子,然后在不考虑周围环境的情况下引入单个角色的运动。这就造成了三维人群的假象,但随着密度的增加,由于没有模拟角色与角色之间的接触,角色之间经常会相互交叉。针对这一问题,我们提出了一种通用方法,可以考虑任何人群动画,并解决现有的残余碰撞问题。为此,我们采用基于物理的方法来模拟铰接角色之间的接触。这样就能实时合成由数十个互不相交的个体组成的三维高密度人群,使动画的物理正确性达到前所未有的水平。在引擎盖下,我们使用参数化人体为每个人建模,并结合一组三维代理来近似他们的体积。然后,我们构建了一个大型铰接刚体系统,并使用高效的物理方法来解决单个人体姿势的问题,使其在保持人群整体运动的同时不会相互碰撞。我们首先客观、定量地验证了我们的方法。然后,我们基于一项广泛的用户研究,探索物理正确性与感知真实性之间的关系,该研究评估了在密集人群中解决接触问题的相关性。结果表明,我们的方法在几何精度和整体逼真度方面优于现有的人群动画制作方法。
{"title":"Resolving Collisions in Dense 3D Crowd Animations","authors":"Gonzalo Gomez-Nogales, Melania Prieto-Martin, Cristian Romero, Marc Comino-Trinidad, Pablo Ramon-Prieto, A. Olivier, Ludovic Hoyet, Miguel Otaduy, J. Pettré, Dan Casas","doi":"10.1145/3687266","DOIUrl":"https://doi.org/10.1145/3687266","url":null,"abstract":"We propose a novel contact-aware method to synthesize highly-dense 3D crowds of animated characters. Existing methods animate crowds by, first, computing the 2D global motion approximating subjects as 2D particles and, then, introducing individual character motions without considering their surroundings. This creates the illusion of a 3D crowd, but, with density, characters frequently intersect each other since character-to-character contact is not modeled. We tackle this issue and propose a general method that considers any crowd animation and resolves existing residual collisions. To this end, we take a physics-based approach to model contacts between articulated characters. This enables the real-time synthesis of 3D high-density crowds with dozens of individuals that do not intersect each other, producing an unprecedented level of physical correctness in animations. Under the hood, we model each individual using a parametric human body incorporating a set of 3D proxies to approximate their volume. We then build a large system of articulated rigid bodies, and use an efficient physics-based approach to solve for individual body poses that do not collide with each other while maintaining the overall motion of the crowd. We first validate our approach objectively and quantitatively. We then explore relations between physical correctness and perceived realism based on an extensive user study that evaluates the relevance of solving contacts in dense crowds. Results demonstrate that our approach outperforms existing methods for crowd animation in terms of geometric accuracy and overall realism.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional Mixture Path Guiding for Differentiable Rendering 用于可变渲染的条件混合路径引导
IF 7.8 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-07-19 DOI: 10.1145/3658133
Zhimin Fan, Pengcheng Shi, Mufan Guo, Ruoyu Fu, Yanwen Guo, Jie Guo
The efficiency of inverse optimization in physically based differentiable rendering heavily depends on the variance of Monte Carlo estimation. Despite recent advancements emphasizing the necessity of tailored differential sampling strategies, the general approaches remain unexplored. In this paper, we investigate the interplay between local sampling decisions and the estimation of light path derivatives. Considering that modern differentiable rendering algorithms share the same path for estimating differential radiance and ordinary radiance, we demonstrate that conventional guiding approaches, conditioned solely on the last vertex, cannot attain this density. Instead, a mixture of different sampling distributions is required, where the weights are conditioned on all the previously sampled vertices in the path. To embody our theory, we implement a conditional mixture path guiding that explicitly computes optimal weights on the fly. Furthermore, we show how to perform positivization to eliminate sign variance and extend to scenes with millions of parameters. To the best of our knowledge, this is the first generic framework for applying path guiding to differentiable rendering. Extensive experiments demonstrate that our method achieves nearly one order of magnitude improvements over state-of-the-art methods in terms of variance reduction in gradient estimation and errors of inverse optimization. The implementation of our proposed method is available at https://github.com/mollnn/conditional-mixture.
在基于物理的可微分渲染中,逆优化的效率在很大程度上取决于蒙特卡罗估计的方差。尽管最近的研究强调了量身定制的微分采样策略的必要性,但一般方法仍有待探索。在本文中,我们研究了局部采样决策与光路导数估计之间的相互作用。考虑到现代可微分渲染算法在估算微分辐射度和普通辐射度时采用相同的路径,我们证明了仅以最后一个顶点为条件的传统引导方法无法达到这一密度。相反,我们需要不同采样分布的混合物,其中的权重取决于路径中先前采样的所有顶点。为了体现我们的理论,我们实现了一种条件混合路径引导法,它能明确地即时计算出最优权重。此外,我们还展示了如何进行正向化以消除符号方差,并扩展到具有数百万个参数的场景。据我们所知,这是第一个将路径引导应用于可微分渲染的通用框架。广泛的实验证明,与最先进的方法相比,我们的方法在梯度估计和逆优化误差的方差缩小方面实现了近一个数量级的改进。我们提出的方法的实现可以在 https://github.com/mollnn/conditional-mixture 上获得。
{"title":"Conditional Mixture Path Guiding for Differentiable Rendering","authors":"Zhimin Fan, Pengcheng Shi, Mufan Guo, Ruoyu Fu, Yanwen Guo, Jie Guo","doi":"10.1145/3658133","DOIUrl":"https://doi.org/10.1145/3658133","url":null,"abstract":"The efficiency of inverse optimization in physically based differentiable rendering heavily depends on the variance of Monte Carlo estimation. Despite recent advancements emphasizing the necessity of tailored differential sampling strategies, the general approaches remain unexplored.\u0000 In this paper, we investigate the interplay between local sampling decisions and the estimation of light path derivatives. Considering that modern differentiable rendering algorithms share the same path for estimating differential radiance and ordinary radiance, we demonstrate that conventional guiding approaches, conditioned solely on the last vertex, cannot attain this density. Instead, a mixture of different sampling distributions is required, where the weights are conditioned on all the previously sampled vertices in the path. To embody our theory, we implement a conditional mixture path guiding that explicitly computes optimal weights on the fly. Furthermore, we show how to perform positivization to eliminate sign variance and extend to scenes with millions of parameters.\u0000 To the best of our knowledge, this is the first generic framework for applying path guiding to differentiable rendering. Extensive experiments demonstrate that our method achieves nearly one order of magnitude improvements over state-of-the-art methods in terms of variance reduction in gradient estimation and errors of inverse optimization. The implementation of our proposed method is available at https://github.com/mollnn/conditional-mixture.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141820899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fully-correlated Anisotropic Micrograin BSDF Model 完全相关的各向异性微晶 BSDF 模型
IF 7.8 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-07-19 DOI: 10.1145/3658224
Simon Lucas, Mickaël Ribardière, R. Pacanowski, Pascal Barla
We introduce an improved version of the micrograin BSDF model [Lucas et al. 2023] for the rendering of anisotropic porous layers. Our approach leverages the properties of micrograins to take into account the correlation between their height and normal, as well as the correlation between the light and view directions. This allows us to derive an exact analytical expression for the Geometrical Attenuation Factor (GAF), summarizing shadowing and masking inside the porous layer. This fully-correlated GAF is then used to define appropriate mixing weights to blend the BSDFs of the porous and base layers. Furthermore, by generalizing the micrograins shape to anisotropy, combined with their fully-correlated GAF, our improved BSDF model produces effects specific to porous layers such as retro-reflection visible on dust layers at grazing angles or height and color correlation that can be found on rusty materials. Finally, we demonstrate very close matches between our BSDF model and light transport simulations realized with explicit instances of micrograins, thus validating our model.
我们引入了微粒 BSDF 模型的改进版本[Lucas 等人,2023],用于各向异性多孔层的渲染。我们的方法利用微晶粒的特性,将其高度和法线之间的相关性以及光线和视线方向之间的相关性考虑在内。这样,我们就能推导出几何衰减系数(GAF)的精确分析表达式,总结出多孔层内部的阴影和遮蔽。然后,利用这个完全相关的 GAF 来定义适当的混合权重,以混合多孔层和基底层的 BSDF。此外,通过将微粒形状概括为各向异性,并结合其完全相关的 GAF,我们改进的 BSDF 模型可产生多孔层特有的效果,例如灰尘层上可见的掠角逆反射,或生锈材料上可见的高度和颜色相关性。最后,我们证明了我们的 BSDF 模型与利用明确的微粒实例实现的光传输模拟之间非常接近,从而验证了我们的模型。
{"title":"A Fully-correlated Anisotropic Micrograin BSDF Model","authors":"Simon Lucas, Mickaël Ribardière, R. Pacanowski, Pascal Barla","doi":"10.1145/3658224","DOIUrl":"https://doi.org/10.1145/3658224","url":null,"abstract":"We introduce an improved version of the micrograin BSDF model [Lucas et al. 2023] for the rendering of anisotropic porous layers. Our approach leverages the properties of micrograins to take into account the correlation between their height and normal, as well as the correlation between the light and view directions. This allows us to derive an exact analytical expression for the Geometrical Attenuation Factor (GAF), summarizing shadowing and masking inside the porous layer. This fully-correlated GAF is then used to define appropriate mixing weights to blend the BSDFs of the porous and base layers. Furthermore, by generalizing the micrograins shape to anisotropy, combined with their fully-correlated GAF, our improved BSDF model produces effects specific to porous layers such as retro-reflection visible on dust layers at grazing angles or height and color correlation that can be found on rusty materials. Finally, we demonstrate very close matches between our BSDF model and light transport simulations realized with explicit instances of micrograins, thus validating our model.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141821041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1