首页 > 最新文献

ACM Transactions on Graphics最新文献

英文 中文
PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly PhysFiT:物理感知三维形状理解,用于完成不完整装配
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-29 DOI: 10.1145/3702226
Weihao Wang, Mingyu You, Hongjun Zhou, Bin He
Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, i.e. , connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.
了解三维形状的零件组成和结构对于广泛的三维应用(包括三维零件装配和三维装配完成)至关重要。与三维零件装配相比,三维装配完成更为复杂,涉及使用工具包修复缺失多个零件的破损或不完整家具。如何揭示潜在的零件关系,从具有相似几何形状的多个无法区分的候选零件中推断出缺失的零件,并完成连接良好、结构稳定且美观的装配,一直是首要挑战。这项任务不仅需要有关零件组成的专业知识,更重要的是要了解物理约束条件,即连接性、稳定性和对称性。忽视这些约束条件往往会导致组装结果虽然在视觉上看似合理,但却不切实际。为了应对这一挑战,我们提出了物理感知三维形状理解框架 PhysFiT。该框架建立在基于注意力的零件关系建模基础上,并结合了连接建模、无模拟稳定性优化和对称变换一致性。我们评估了该框架在三维零件装配和三维装配完成(这是本研究中提出的一项新的装配任务)方面的功效。大量实验证明,PhysFiT 在构建几何上合理、物理上符合要求的装配方面非常有效。
{"title":"PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly","authors":"Weihao Wang, Mingyu You, Hongjun Zhou, Bin He","doi":"10.1145/3702226","DOIUrl":"https://doi.org/10.1145/3702226","url":null,"abstract":"Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, <jats:italic>i.e.</jats:italic> , connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"17 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synchronized tracing of primitive-based implicit volumes 基于基元的隐式卷的同步追踪
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-28 DOI: 10.1145/3702227
Cédric Zanni
Implicit volumes are known for their ability to represent smooth shapes of arbitrary topology thanks to hierarchical combinations of primitives using a structure called a blobtree. We present a new tile-based rendering pipeline well suited for modeling scenarios, i.e., no preprocessing is required when primitive parameters are updated. When using approximate signed distance fields (fields with Lipschitz bound close to 1), we rely on compact, smooth CSG operators - extended from standard bounded operators - to compute a tight augmented bounding volume for all primitives of the blobtree. The pipeline relies on a low-resolution A-buffer storing the primitives of interest of a given screen tile. The A-buffer is then used during ray processing to synchronize threads within a subfrustum. This allows coherent field evaluation within workgroups. We use a sparse bottom-up tree traversal to prune the blobtree on-the-fly which allows us to decorrelate field evaluation complexity from the full blobtree size. The ray processing itself is done using the sphere tracing algorithm. The pipeline scales well to volumes consisting of thousands of primitives.
众所周知,隐含体能够表现任意拓扑结构的平滑形状,这要归功于使用一种叫做 blobtree 的结构对基元进行分层组合。我们提出了一种新的基于瓦片的渲染管道,非常适合建模场景,即在更新基元参数时无需进行预处理。在使用近似带符号距离场(Lipschitz 边界接近 1 的场)时,我们依靠紧凑、平滑的 CSG 算子(从标准有界算子扩展而来),为 blobtree 的所有基元计算紧密的增强边界体积。该流水线依靠低分辨率的 A 型缓冲区来存储给定屏幕磁贴中感兴趣的基元。然后在光线处理过程中使用 A 缓冲区来同步子信道内的线程。这样就能在工作组内进行连贯的实地评估。我们使用自下而上的稀疏树遍历来即时修剪 Blobtree,这样就能将字段评估复杂度与整个 Blobtree 的大小区分开来。射线处理本身是通过球体追踪算法完成的。该管道可以很好地扩展到由数千个基元组成的体积。
{"title":"Synchronized tracing of primitive-based implicit volumes","authors":"Cédric Zanni","doi":"10.1145/3702227","DOIUrl":"https://doi.org/10.1145/3702227","url":null,"abstract":"Implicit volumes are known for their ability to represent smooth shapes of arbitrary topology thanks to hierarchical combinations of primitives using a structure called a blobtree. We present a new tile-based rendering pipeline well suited for modeling scenarios, i.e., no preprocessing is required when primitive parameters are updated. When using approximate signed distance fields (fields with Lipschitz bound close to 1), we rely on compact, smooth CSG operators - extended from standard bounded operators - to compute a tight augmented bounding volume for all primitives of the blobtree. The pipeline relies on a low-resolution A-buffer storing the primitives of interest of a given screen tile. The A-buffer is then used during ray processing to synchronize threads within a subfrustum. This allows coherent field evaluation within workgroups. We use a sparse bottom-up tree traversal to prune the blobtree on-the-fly which allows us to decorrelate field evaluation complexity from the full blobtree size. The ray processing itself is done using the sphere tracing algorithm. The pipeline scales well to volumes consisting of thousands of primitives.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis TriHuman:用于详细人体几何和外观合成的实时可控三平面表示法
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-24 DOI: 10.1145/3697140
Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann
Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose TriHuman a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.
仅从视频数据创建可控的、逼真的和几何细节丰富的真人数字替身是计算机图形学和视觉领域的一项关键挑战,尤其是在要求实时性能的情况下。最近的方法将神经辐射场(NeRF)附加到铰接结构(如人体模型或骨架)上,将点映射到姿势规范空间,同时将神经辐射场调节到骨架姿势上。这些方法通常使用多层感知器(MLP)对神经场进行参数化,因此运行速度较慢。为了解决这一缺点,我们提出了 TriHuman,这是一种新颖的、适合人体的、可变形的、高效的三平面表示法,可实现实时性能、最先进的姿势可控几何合成以及逼真的渲染质量。其核心是,我们将全局光线样本非刚性地扭曲到未变形的三平面纹理空间中,从而有效解决了全局点映射到相同三平面位置的问题。然后,我们展示了这种三平面特征表示如何以骨骼运动为条件,以考虑动态外观和几何变化。我们的结果表明,在人类的几何和外观建模以及运行时性能方面,我们向更高质量迈出了明显的一步。
{"title":"TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis","authors":"Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann","doi":"10.1145/3697140","DOIUrl":"https://doi.org/10.1145/3697140","url":null,"abstract":"Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose <jats:italic>TriHuman</jats:italic> a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"4 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAMO: A Deep Solver for Arbitrary Marker Configuration in Optical Motion Capture DAMO:光学运动捕捉中任意标记配置的深度求解器
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-14 DOI: 10.1145/3695865
KyeongMin Kim, SeungWon Seo, DongHeun Han, HyeongYeop Kang
Marker-based optical motion capture (mocap) systems are increasingly utilized for acquiring 3D human motion, offering advantages in capturing the subtle nuances of human movement, style consistency, and ease of obtaining desired motion. Motion data acquisition via mocap typically requires laborious marker labeling and motion reconstruction, recent deep-learning solutions have aimed to automate the process. However, such solutions generally presuppose a fixed marker configuration to reduce learning complexity, thereby limiting flexibility. To overcome the limitation, we introduce DAMO, an end-to-end deep solver, proficiently inferring arbitrary marker configurations and optimizing pose reconstruction. DAMO outperforms state-of-the-art like SOMA and MoCap-Solver in scenarios with significant noise and unknown marker configurations. We expect that DAMO will meet various practical demands such as facilitating dynamic marker configuration adjustments during capture sessions, processing marker clouds irrespective of whether they employ mixed or entirely unknown marker configurations, and allowing custom marker configurations to suit distinct capture scenarios. DAMO code and pretrained models are available at https://github.com/CritBear/damo .
基于标记的光学动作捕捉(mocap)系统越来越多地被用于获取三维人体动作,它在捕捉人体动作的细微差别、风格一致性和轻松获取所需动作方面具有优势。通过 mocap 采集运动数据通常需要进行费力的标记标注和运动重建,而最近的深度学习解决方案旨在实现这一过程的自动化。然而,这些解决方案一般都以固定的标记配置为前提,以降低学习的复杂性,从而限制了灵活性。为了克服这一限制,我们引入了端到端深度求解器 DAMO,它能熟练推断任意标记配置并优化姿态重建。在存在大量噪声和未知标记配置的情况下,DAMO 的表现优于 SOMA 和 MoCap-Solver 等最先进的解算器。我们希望 DAMO 能够满足各种实际需求,例如在捕捉过程中促进动态标记配置调整、处理标记云(无论其是否采用混合或完全未知的标记配置)以及允许自定义标记配置以适应不同的捕捉场景。DAMO 代码和预训练模型可在 https://github.com/CritBear/damo 上获取。
{"title":"DAMO: A Deep Solver for Arbitrary Marker Configuration in Optical Motion Capture","authors":"KyeongMin Kim, SeungWon Seo, DongHeun Han, HyeongYeop Kang","doi":"10.1145/3695865","DOIUrl":"https://doi.org/10.1145/3695865","url":null,"abstract":"Marker-based optical motion capture (mocap) systems are increasingly utilized for acquiring 3D human motion, offering advantages in capturing the subtle nuances of human movement, style consistency, and ease of obtaining desired motion. Motion data acquisition via mocap typically requires laborious marker labeling and motion reconstruction, recent deep-learning solutions have aimed to automate the process. However, such solutions generally presuppose a fixed marker configuration to reduce learning complexity, thereby limiting flexibility. To overcome the limitation, we introduce DAMO, an end-to-end deep solver, proficiently inferring arbitrary marker configurations and optimizing pose reconstruction. DAMO outperforms state-of-the-art like SOMA and MoCap-Solver in scenarios with significant noise and unknown marker configurations. We expect that DAMO will meet various practical demands such as facilitating dynamic marker configuration adjustments during capture sessions, processing marker clouds irrespective of whether they employ mixed or entirely unknown marker configurations, and allowing custom marker configurations to suit distinct capture scenarios. DAMO code and pretrained models are available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/CritBear/damo\">https://github.com/CritBear/damo</jats:ext-link> .","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"19 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA: Relightable Neural Assets RNA:可重燃的神经资产
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-09-13 DOI: 10.1145/3695866
Krishna Mullia, Fujun Luan, Xin Sun, Miloš Hašan
High-fidelity 3D assets with materials composed of fibers (including hair), complex layered material shaders, or fine scattering geometry are critical in high-end realistic rendering applications. Rendering such models is computationally expensive due to heavy shaders and long scattering paths. Moreover, implementing the shading and scattering models is non-trivial and has to be done not only in the 3D content authoring software (which is necessarily complex), but also in all downstream rendering solutions. For example, web and mobile viewers for complex 3D assets are desirable, but frequently cannot support the full shading complexity allowed by the authoring application. Our goal is to design a neural representation for 3D assets with complex shading that supports full relightability and full integration into existing renderers. We provide an end-to-end shading solution at the first intersection of a ray with the underlying geometry. All shading and scattering is precomputed and included in the neural asset; no multiple scattering paths need to be traced, and no complex shading models need to be implemented to render our assets, beyond a single neural architecture. We combine an MLP decoder with a feature grid. Shading consists of querying a feature vector, followed by an MLP evaluation producing the final reflectance value. Our method provides high-fidelity shading, close to the ground-truth Monte Carlo estimate even at close-up views. We believe our neural assets could be used in practical renderers, providing significant speed-ups and simplifying renderer implementations.
在高端逼真渲染应用中,具有由纤维(包括头发)、复杂分层材质着色器或精细散射几何体组成的材质的高保真 3D 资产至关重要。由于着色器和散射路径较长,渲染这类模型的计算成本很高。此外,实现着色和散射模型并非易事,不仅需要在三维内容制作软件中完成(这必然很复杂),还需要在所有下游渲染解决方案中完成。例如,复杂三维资产的网络和移动浏览器固然可取,但往往无法支持创作应用程序所允许的全部着色复杂性。我们的目标是为具有复杂着色的三维资产设计一种神经表示法,它支持完全的可重照性,并能完全集成到现有的渲染器中。我们在光线与底层几何体的第一个交叉点提供端到端的着色解决方案。所有的着色和散射都是预先计算好的,并包含在神经资产中;除了单一的神经架构外,无需追踪多个散射路径,也无需实施复杂的着色模型来渲染我们的资产。我们将 MLP 解码器与特征网格相结合。着色包括查询特征向量,然后通过 MLP 评估得出最终反射值。我们的方法能提供高保真的阴影效果,即使在近距离观察时也能接近地面实况蒙特卡洛估计值。我们相信,我们的神经资产可用于实际的渲染器中,显著提高速度并简化渲染器的实现。
{"title":"RNA: Relightable Neural Assets","authors":"Krishna Mullia, Fujun Luan, Xin Sun, Miloš Hašan","doi":"10.1145/3695866","DOIUrl":"https://doi.org/10.1145/3695866","url":null,"abstract":"High-fidelity 3D assets with materials composed of fibers (including hair), complex layered material shaders, or fine scattering geometry are critical in high-end realistic rendering applications. Rendering such models is computationally expensive due to heavy shaders and long scattering paths. Moreover, implementing the shading and scattering models is non-trivial and has to be done not only in the 3D content authoring software (which is necessarily complex), but also in all downstream rendering solutions. For example, web and mobile viewers for complex 3D assets are desirable, but frequently cannot support the full shading complexity allowed by the authoring application. Our goal is to design a neural representation for 3D assets with complex shading that supports full relightability and full integration into existing renderers. We provide an end-to-end shading solution at the first intersection of a ray with the underlying geometry. All shading and scattering is precomputed and included in the neural asset; no multiple scattering paths need to be traced, and no complex shading models need to be implemented to render our assets, beyond a single neural architecture. We combine an MLP decoder with a feature grid. Shading consists of querying a feature vector, followed by an MLP evaluation producing the final reflectance value. Our method provides high-fidelity shading, close to the ground-truth Monte Carlo estimate even at close-up views. We believe our neural assets could be used in practical renderers, providing significant speed-ups and simplifying renderer implementations.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"27 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speed-Aware Audio-Driven Speech Animation using Adaptive Windows 使用自适应窗口的速度感知音频驱动语音动画
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-31 DOI: 10.1145/3691341
Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonho Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh
We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.
我们提出了一种新颖的方法,可以利用多个自适应窗口从音频生成逼真的三维人脸语音动画。与以往使用固定大小音频窗口的研究不同,我们的方法接受自适应音频窗口作为输入,反映音频说话速度,以使用一致的音位信息。我们的系统由三部分组成。首先,使用以自我监督方式训练的神经网络从输入音频中估计说话速度。其次,根据估算出的语速,自适应地预测出包含音频特征的适当窗口大小。另一个关键因素在于使用多个不同大小的音频窗口作为动画生成器的输入:小窗口集中于细节信息,大窗口则考虑中心帧附近的广泛音位信息。最后,由多个自适应音频窗口生成语音动画。我们的方法可以从任何语速的现场音频中生成逼真的语音动画,如快速说唱、慢速歌曲以及正常语音。我们通过大量定量和定性评估(包括用户研究)证明,我们的方法优于最先进的方法。
{"title":"Speed-Aware Audio-Driven Speech Animation using Adaptive Windows","authors":"Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonho Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh","doi":"10.1145/3691341","DOIUrl":"https://doi.org/10.1145/3691341","url":null,"abstract":"We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"32 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ControlMat: A Controlled Generative Approach to Material Capture 控制垫:材料捕捉的受控生成方法
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-27 DOI: 10.1145/3688830
Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur
Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials that could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space optimization methods, and we carefully validate our diffusion process design choices. 1
从照片中重建材料是三维内容创作民主化的关键组成部分。我们建议利用生成式深度网络的最新进展,将这一难以解决的问题表述为受控合成问题。我们提出的 ControlMat 是一种方法,它以一张光照不受控制的照片为输入,利用扩散模型生成可信的、可拼贴的、高分辨率的物理数字材料。我们仔细分析了扩散模型在多通道输出时的行为,调整了采样过程以融合多尺度信息,并引入了滚动扩散以实现高分辨率输出的平铺性和修补扩散。我们的生成方法还允许探索与输入图像相对应的各种材料,从而减轻未知照明条件的影响。我们的研究表明,我们的方法优于最新的推理和潜空间优化方法,我们还仔细验证了我们的扩散过程设计选择。 1
{"title":"ControlMat: A Controlled Generative Approach to Material Capture","authors":"Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur","doi":"10.1145/3688830","DOIUrl":"https://doi.org/10.1145/3688830","url":null,"abstract":"Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials that could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space optimization methods, and we carefully validate our diffusion process design choices. <jats:xref ref-type=\"fn\"> <jats:sup>1</jats:sup> </jats:xref>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"17 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Closest Point Method for PDEs on Manifolds with Interior Boundary Conditions for Geometry Processing 用于几何处理的具有内部边界条件的曲面上 PDE 的最邻近点方法
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-17 DOI: 10.1145/3673652
Nathan King, Haozhe Su, Mridul Aanjaneya, Steven Ruuth, Christopher Batty

Many geometry processing techniques require the solution of partial differential equations (PDEs) on manifolds embedded in (mathbb {R}^2 ) or (mathbb {R}^3 ), such as curves or surfaces. Such manifold PDEs often involve boundary conditions (e.g., Dirichlet or Neumann) prescribed at points or curves on the manifold’s interior or along the geometric (exterior) boundary of an open manifold. However, input manifolds can take many forms (e.g., triangle meshes, parametrizations, point clouds, implicit functions, etc.). Typically, one must generate a mesh to apply finite element-type techniques or derive specialized discretization procedures for each distinct manifold representation. We propose instead to address such problems in a unified manner through a novel extension of the closest point method (CPM) to handle interior boundary conditions. CPM solves the manifold PDE by solving a volumetric PDE defined over the Cartesian embedding space containing the manifold, and requires only a closest point representation of the manifold. Hence, CPM supports objects that are open or closed, orientable or not, and of any codimension. To enable support for interior boundary conditions we derive a method that implicitly partitions the embedding space across interior boundaries. CPM’s finite difference and interpolation stencils are adapted to respect this partition while preserving second-order accuracy. Additionally, we develop an efficient sparse-grid implementation and numerical solver that can scale to tens of millions of degrees of freedom, allowing PDEs to be solved on more complex manifolds. We demonstrate our method’s convergence behaviour on selected model PDEs and explore several geometry processing problems: diffusion curves on surfaces, geodesic distance, tangent vector field design, harmonic map construction, and reaction-diffusion textures. Our proposed approach thus offers a powerful and flexible new tool for a range of geometry processing tasks on general manifold representations.

许多几何处理技术需要求解嵌入(mathbb {R}^2 )或(mathbb {R}^3 )曲线或曲面的流形上的偏微分方程(PDEs)。这类流形 PDE 通常涉及在流形内部或沿开放流形的几何(外部)边界的点或曲线上规定的边界条件(如 Dirichlet 或 Neumann)。然而,输入流形可以有多种形式(如三角形网格、参数化、点云、隐式函数等)。通常情况下,我们必须生成一个网格来应用有限元类型的技术,或为每种不同的流形表示推导专门的离散化程序。我们建议通过对最邻近点法(CPM)进行新的扩展,以统一的方式解决此类问题,从而处理内部边界条件。CPM 通过求解定义在包含流形的笛卡尔嵌入空间上的体积 PDE 来求解流形 PDE,并且只需要流形的最邻近点表示。因此,CPM 支持开放或封闭、可定向或不可定向以及任何码元的对象。为了支持内部边界条件,我们推导出一种方法,隐式地将嵌入空间划分为内部边界。对 CPM 的有限差分和插值模板进行了调整,以便在保持二阶精度的同时尊重这种分割。此外,我们还开发了一种高效的稀疏网格实现和数值求解器,可以扩展到数千万自由度,从而可以在更复杂的流形上求解 PDE。我们在选定的模型 PDEs 上演示了我们方法的收敛行为,并探讨了几个几何处理问题:曲面上的扩散曲线、测地距离、切向量场设计、谐波图构建和反应扩散纹理。因此,我们提出的方法为一般流形表示上的一系列几何处理任务提供了强大而灵活的新工具。
{"title":"A Closest Point Method for PDEs on Manifolds with Interior Boundary Conditions for Geometry Processing","authors":"Nathan King, Haozhe Su, Mridul Aanjaneya, Steven Ruuth, Christopher Batty","doi":"10.1145/3673652","DOIUrl":"https://doi.org/10.1145/3673652","url":null,"abstract":"<p>Many geometry processing techniques require the solution of partial differential equations (PDEs) on manifolds embedded in (mathbb {R}^2 ) or (mathbb {R}^3 ), such as curves or surfaces. Such <i>manifold PDEs</i> often involve boundary conditions (e.g., Dirichlet or Neumann) prescribed at points or curves on the manifold’s interior or along the geometric (exterior) boundary of an open manifold. However, input manifolds can take many forms (e.g., triangle meshes, parametrizations, point clouds, implicit functions, etc.). Typically, one must generate a mesh to apply finite element-type techniques or derive specialized discretization procedures for each distinct manifold representation. We propose instead to address such problems in a unified manner through a novel extension of the <i>closest point method</i> (CPM) to handle interior boundary conditions. CPM solves the manifold PDE by solving a volumetric PDE defined over the Cartesian embedding space containing the manifold, and requires only a closest point representation of the manifold. Hence, CPM supports objects that are open or closed, orientable or not, and of any codimension. To enable support for interior boundary conditions we derive a method that implicitly partitions the embedding space across interior boundaries. CPM’s finite difference and interpolation stencils are adapted to respect this partition while preserving second-order accuracy. Additionally, we develop an efficient sparse-grid implementation and numerical solver that can scale to tens of millions of degrees of freedom, allowing PDEs to be solved on more complex manifolds. We demonstrate our method’s convergence behaviour on selected model PDEs and explore several geometry processing problems: diffusion curves on surfaces, geodesic distance, tangent vector field design, harmonic map construction, and reaction-diffusion textures. Our proposed approach thus offers a powerful and flexible new tool for a range of geometry processing tasks on general manifold representations.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"12 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141333673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analytic rotation-invariant modelling of anisotropic finite elements 各向异性有限元的旋转不变分析建模
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-28 DOI: 10.1145/3666086
Huancheng Lin, Floyd Mulenga Chitalu, Taku Komura

Anisotropic hyperelastic distortion energies are used to solve many problems in fields like computer graphics and engineering with applications in shape analysis, deformation, design, mesh parameterization, biomechanics and more. However, formulating a robust anisotropic energy that is low-order and yet sufficiently non-linear remains a challenging problem for achieving the convergence promised by Newton-type methods in numerical optimization. In this paper, we propose a novel analytic formulation of an anisotropic energy that is smooth everywhere, low-order, rotationally-invariant and at-least twice differentiable. At its core, our approach utilizes implicit rotation factorizations with invariants of the Cauchy-Green tensor that arises from the deformation gradient. The versatility and generality of our analysis is demonstrated through a variety of examples, where we also show that the constitutive law suggested by the anisotropic version of the well-known As-Rigid-As-Possible energy is the foundational parametric description of both passive and active elastic materials. The generality of our approach means that we can systematically derive the force and force-Jacobian expressions for use in implicit and quasistatic numerical optimization schemes, and we can also use our analysis to rewrite, simplify and speedup several existing anisotropic and isotropic distortion energies with guaranteed inversion-safety.

各向异性超弹性变形能用于解决计算机图形学和工程学等领域的许多问题,在形状分析、变形、设计、网格参数化、生物力学等方面都有应用。然而,要在数值优化中实现牛顿型方法所承诺的收敛性,制定一个低阶但足够非线性的稳健各向异性能量仍然是一个具有挑战性的问题。在本文中,我们提出了一种新颖的各向异性能量解析公式,该公式处处平滑、阶数低、旋转不变且至少有两次微分。其核心是,我们的方法利用隐式旋转因式分解与变形梯度产生的考奇-格林张量的不变式。我们通过各种实例证明了我们的分析方法的通用性和普遍性,我们还证明了著名的 As-Rigid-As-Possible 能量的各向异性版本所提出的构成法则是被动和主动弹性材料的基础参数描述。我们方法的通用性意味着我们可以系统地推导出用于隐式和准静态数值优化方案的力和力-雅各布表达式,我们还可以利用我们的分析重写、简化和加速现有的几种各向异性和各向同性变形能,并保证反演安全。
{"title":"Analytic rotation-invariant modelling of anisotropic finite elements","authors":"Huancheng Lin, Floyd Mulenga Chitalu, Taku Komura","doi":"10.1145/3666086","DOIUrl":"https://doi.org/10.1145/3666086","url":null,"abstract":"<p>Anisotropic hyperelastic distortion energies are used to solve many problems in fields like computer graphics and engineering with applications in shape analysis, deformation, design, mesh parameterization, biomechanics and more. However, formulating a robust anisotropic energy that is low-order and yet sufficiently non-linear remains a challenging problem for achieving the convergence promised by Newton-type methods in numerical optimization. In this paper, we propose a novel analytic formulation of an anisotropic energy that is smooth everywhere, low-order, rotationally-invariant and at-least twice differentiable. At its core, our approach utilizes implicit rotation factorizations with invariants of the Cauchy-Green tensor that arises from the deformation gradient. The versatility and generality of our analysis is demonstrated through a variety of examples, where we also show that the constitutive law suggested by the anisotropic version of the well-known <i>As-Rigid-As-Possible</i> energy is the foundational parametric description of both passive and active elastic materials. The generality of our approach means that we can systematically derive the force and force-Jacobian expressions for use in implicit and quasistatic numerical optimization schemes, and we can also use our analysis to rewrite, simplify and speedup several existing anisotropic <i>and</i> isotropic distortion energies with guaranteed inversion-safety.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"48 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141159477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Framework for Solving Parabolic Partial Differential Equations on Discrete Domains 求解离散域上抛物线偏微分方程的框架
IF 6.2 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-05-28 DOI: 10.1145/3666087
Leticia Mattos Da Silva, Oded Stein, Justin Solomon

We introduce a framework for solving a class of parabolic partial differential equations on triangle mesh surfaces, including the Hamilton-Jacobi equation and the Fokker-Planck equation. PDE in this class often have nonlinear or stiff terms that cannot be resolved with standard methods on curved triangle meshes. To address this challenge, we leverage a splitting integrator combined with a convex optimization step to solve these PDE. Our machinery can be used to compute entropic approximation of optimal transport distances on geometric domains, overcoming the numerical limitations of the state-of-the-art method. In addition, we demonstrate the versatility of our method on a number of linear and nonlinear PDE that appear in diffusion and front propagation tasks in geometry processing.

我们介绍了一种解决三角形网格表面抛物线偏微分方程的框架,包括汉密尔顿-贾科比方程和福克-普朗克方程。这类偏微分方程通常有非线性或刚性项,无法用标准方法在曲面三角形网格上求解。为了应对这一挑战,我们利用分裂积分器结合凸优化步骤来求解这些 PDE。我们的机制可用于计算几何域上最佳传输距离的熵近似值,克服了最先进方法的数值限制。此外,我们还展示了我们的方法在一些线性和非线性 PDE 上的多功能性,这些 PDE 出现在几何处理中的扩散和前传播任务中。
{"title":"A Framework for Solving Parabolic Partial Differential Equations on Discrete Domains","authors":"Leticia Mattos Da Silva, Oded Stein, Justin Solomon","doi":"10.1145/3666087","DOIUrl":"https://doi.org/10.1145/3666087","url":null,"abstract":"<p>We introduce a framework for solving a class of parabolic partial differential equations on triangle mesh surfaces, including the Hamilton-Jacobi equation and the Fokker-Planck equation. PDE in this class often have nonlinear or stiff terms that cannot be resolved with standard methods on curved triangle meshes. To address this challenge, we leverage a splitting integrator combined with a convex optimization step to solve these PDE. Our machinery can be used to compute entropic approximation of optimal transport distances on geometric domains, overcoming the numerical limitations of the state-of-the-art method. In addition, we demonstrate the versatility of our method on a number of linear and nonlinear PDE that appear in diffusion and front propagation tasks in geometry processing.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"60 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141159569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1