Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, i.e. , connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.
{"title":"PhysFiT: Physical-aware 3D Shape Understanding for Finishing Incomplete Assembly","authors":"Weihao Wang, Mingyu You, Hongjun Zhou, Bin He","doi":"10.1145/3702226","DOIUrl":"https://doi.org/10.1145/3702226","url":null,"abstract":"Understanding the part composition and structure of 3D shapes is crucial for a wide range of 3D applications, including 3D part assembly and 3D assembly completion. Compared to 3D part assembly, 3D assembly completion is more complicated which involves repairing broken or incomplete furniture that miss several parts with a toolkit. The primary challenge persists in how to reveal the potential part relations to infer the absent parts from multiple indistinguishable candidates with similar geometries, and complete for well-connected, structurally stable and aesthetically pleasing assemblies. This task necessitates not only specialized knowledge of part composition but, more importantly, an awareness of physical constraints, <jats:italic>i.e.</jats:italic> , connectivity, stability, and symmetry. Neglecting these constraints often results in assemblies that, although visually plausible, are impractical. To address this challenge, we propose PhysFiT, a physical-aware 3D shape understanding framework. This framework is built upon attention-based part relation modeling and incorporates connection modeling, simulation-free stability optimization and symmetric transformation consistency. We evaluate its efficacy on 3D part assembly and 3D assembly completion, a novel assembly task presented in this work. Extensive experiments demonstrate the effectiveness of PhysFiT in constructing geometrically sound and physically compliant assemblies.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implicit volumes are known for their ability to represent smooth shapes of arbitrary topology thanks to hierarchical combinations of primitives using a structure called a blobtree. We present a new tile-based rendering pipeline well suited for modeling scenarios, i.e., no preprocessing is required when primitive parameters are updated. When using approximate signed distance fields (fields with Lipschitz bound close to 1), we rely on compact, smooth CSG operators - extended from standard bounded operators - to compute a tight augmented bounding volume for all primitives of the blobtree. The pipeline relies on a low-resolution A-buffer storing the primitives of interest of a given screen tile. The A-buffer is then used during ray processing to synchronize threads within a subfrustum. This allows coherent field evaluation within workgroups. We use a sparse bottom-up tree traversal to prune the blobtree on-the-fly which allows us to decorrelate field evaluation complexity from the full blobtree size. The ray processing itself is done using the sphere tracing algorithm. The pipeline scales well to volumes consisting of thousands of primitives.
众所周知,隐含体能够表现任意拓扑结构的平滑形状,这要归功于使用一种叫做 blobtree 的结构对基元进行分层组合。我们提出了一种新的基于瓦片的渲染管道,非常适合建模场景,即在更新基元参数时无需进行预处理。在使用近似带符号距离场(Lipschitz 边界接近 1 的场)时,我们依靠紧凑、平滑的 CSG 算子(从标准有界算子扩展而来),为 blobtree 的所有基元计算紧密的增强边界体积。该流水线依靠低分辨率的 A 型缓冲区来存储给定屏幕磁贴中感兴趣的基元。然后在光线处理过程中使用 A 缓冲区来同步子信道内的线程。这样就能在工作组内进行连贯的实地评估。我们使用自下而上的稀疏树遍历来即时修剪 Blobtree,这样就能将字段评估复杂度与整个 Blobtree 的大小区分开来。射线处理本身是通过球体追踪算法完成的。该管道可以很好地扩展到由数千个基元组成的体积。
{"title":"Synchronized tracing of primitive-based implicit volumes","authors":"Cédric Zanni","doi":"10.1145/3702227","DOIUrl":"https://doi.org/10.1145/3702227","url":null,"abstract":"Implicit volumes are known for their ability to represent smooth shapes of arbitrary topology thanks to hierarchical combinations of primitives using a structure called a blobtree. We present a new tile-based rendering pipeline well suited for modeling scenarios, i.e., no preprocessing is required when primitive parameters are updated. When using approximate signed distance fields (fields with Lipschitz bound close to 1), we rely on compact, smooth CSG operators - extended from standard bounded operators - to compute a tight augmented bounding volume for all primitives of the blobtree. The pipeline relies on a low-resolution A-buffer storing the primitives of interest of a given screen tile. The A-buffer is then used during ray processing to synchronize threads within a subfrustum. This allows coherent field evaluation within workgroups. We use a sparse bottom-up tree traversal to prune the blobtree on-the-fly which allows us to decorrelate field evaluation complexity from the full blobtree size. The ray processing itself is done using the sphere tracing algorithm. The pipeline scales well to volumes consisting of thousands of primitives.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann
Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose TriHuman a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.
{"title":"TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis","authors":"Heming Zhu, Fangneng Zhan, Christian Theobalt, Marc Habermann","doi":"10.1145/3697140","DOIUrl":"https://doi.org/10.1145/3697140","url":null,"abstract":"Creating controllable, photorealistic, and geometrically detailed digital doubles of real humans solely from video data is a key challenge in Computer Graphics and Vision, especially when real-time performance is required. Recent methods attach a neural radiance field (NeRF) to an articulated structure, e.g., a body model or a skeleton, to map points into a pose canonical space while conditioning the NeRF on the skeletal pose. These approaches typically parameterize the neural field with a multi-layer perceptron (MLP) leading to a slow runtime. To address this drawback, we propose <jats:italic>TriHuman</jats:italic> a novel human-tailored, deformable, and efficient tri-plane representation, which achieves real-time performance, state-of-the-art pose-controllable geometry synthesis as well as photorealistic rendering quality. At the core, we non-rigidly warp global ray samples into our undeformed tri-plane texture space, which effectively addresses the problem of global points being mapped to the same tri-plane locations. We then show how such a tri-plane feature representation can be conditioned on the skeletal motion to account for dynamic appearance and geometry changes. Our results demonstrate a clear step towards higher quality in terms of geometry and appearance modeling of humans as well as runtime performance.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
KyeongMin Kim, SeungWon Seo, DongHeun Han, HyeongYeop Kang
Marker-based optical motion capture (mocap) systems are increasingly utilized for acquiring 3D human motion, offering advantages in capturing the subtle nuances of human movement, style consistency, and ease of obtaining desired motion. Motion data acquisition via mocap typically requires laborious marker labeling and motion reconstruction, recent deep-learning solutions have aimed to automate the process. However, such solutions generally presuppose a fixed marker configuration to reduce learning complexity, thereby limiting flexibility. To overcome the limitation, we introduce DAMO, an end-to-end deep solver, proficiently inferring arbitrary marker configurations and optimizing pose reconstruction. DAMO outperforms state-of-the-art like SOMA and MoCap-Solver in scenarios with significant noise and unknown marker configurations. We expect that DAMO will meet various practical demands such as facilitating dynamic marker configuration adjustments during capture sessions, processing marker clouds irrespective of whether they employ mixed or entirely unknown marker configurations, and allowing custom marker configurations to suit distinct capture scenarios. DAMO code and pretrained models are available at https://github.com/CritBear/damo .
{"title":"DAMO: A Deep Solver for Arbitrary Marker Configuration in Optical Motion Capture","authors":"KyeongMin Kim, SeungWon Seo, DongHeun Han, HyeongYeop Kang","doi":"10.1145/3695865","DOIUrl":"https://doi.org/10.1145/3695865","url":null,"abstract":"Marker-based optical motion capture (mocap) systems are increasingly utilized for acquiring 3D human motion, offering advantages in capturing the subtle nuances of human movement, style consistency, and ease of obtaining desired motion. Motion data acquisition via mocap typically requires laborious marker labeling and motion reconstruction, recent deep-learning solutions have aimed to automate the process. However, such solutions generally presuppose a fixed marker configuration to reduce learning complexity, thereby limiting flexibility. To overcome the limitation, we introduce DAMO, an end-to-end deep solver, proficiently inferring arbitrary marker configurations and optimizing pose reconstruction. DAMO outperforms state-of-the-art like SOMA and MoCap-Solver in scenarios with significant noise and unknown marker configurations. We expect that DAMO will meet various practical demands such as facilitating dynamic marker configuration adjustments during capture sessions, processing marker clouds irrespective of whether they employ mixed or entirely unknown marker configurations, and allowing custom marker configurations to suit distinct capture scenarios. DAMO code and pretrained models are available at <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/CritBear/damo\">https://github.com/CritBear/damo</jats:ext-link> .","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-fidelity 3D assets with materials composed of fibers (including hair), complex layered material shaders, or fine scattering geometry are critical in high-end realistic rendering applications. Rendering such models is computationally expensive due to heavy shaders and long scattering paths. Moreover, implementing the shading and scattering models is non-trivial and has to be done not only in the 3D content authoring software (which is necessarily complex), but also in all downstream rendering solutions. For example, web and mobile viewers for complex 3D assets are desirable, but frequently cannot support the full shading complexity allowed by the authoring application. Our goal is to design a neural representation for 3D assets with complex shading that supports full relightability and full integration into existing renderers. We provide an end-to-end shading solution at the first intersection of a ray with the underlying geometry. All shading and scattering is precomputed and included in the neural asset; no multiple scattering paths need to be traced, and no complex shading models need to be implemented to render our assets, beyond a single neural architecture. We combine an MLP decoder with a feature grid. Shading consists of querying a feature vector, followed by an MLP evaluation producing the final reflectance value. Our method provides high-fidelity shading, close to the ground-truth Monte Carlo estimate even at close-up views. We believe our neural assets could be used in practical renderers, providing significant speed-ups and simplifying renderer implementations.
在高端逼真渲染应用中,具有由纤维(包括头发)、复杂分层材质着色器或精细散射几何体组成的材质的高保真 3D 资产至关重要。由于着色器和散射路径较长,渲染这类模型的计算成本很高。此外,实现着色和散射模型并非易事,不仅需要在三维内容制作软件中完成(这必然很复杂),还需要在所有下游渲染解决方案中完成。例如,复杂三维资产的网络和移动浏览器固然可取,但往往无法支持创作应用程序所允许的全部着色复杂性。我们的目标是为具有复杂着色的三维资产设计一种神经表示法,它支持完全的可重照性,并能完全集成到现有的渲染器中。我们在光线与底层几何体的第一个交叉点提供端到端的着色解决方案。所有的着色和散射都是预先计算好的,并包含在神经资产中;除了单一的神经架构外,无需追踪多个散射路径,也无需实施复杂的着色模型来渲染我们的资产。我们将 MLP 解码器与特征网格相结合。着色包括查询特征向量,然后通过 MLP 评估得出最终反射值。我们的方法能提供高保真的阴影效果,即使在近距离观察时也能接近地面实况蒙特卡洛估计值。我们相信,我们的神经资产可用于实际的渲染器中,显著提高速度并简化渲染器的实现。
{"title":"RNA: Relightable Neural Assets","authors":"Krishna Mullia, Fujun Luan, Xin Sun, Miloš Hašan","doi":"10.1145/3695866","DOIUrl":"https://doi.org/10.1145/3695866","url":null,"abstract":"High-fidelity 3D assets with materials composed of fibers (including hair), complex layered material shaders, or fine scattering geometry are critical in high-end realistic rendering applications. Rendering such models is computationally expensive due to heavy shaders and long scattering paths. Moreover, implementing the shading and scattering models is non-trivial and has to be done not only in the 3D content authoring software (which is necessarily complex), but also in all downstream rendering solutions. For example, web and mobile viewers for complex 3D assets are desirable, but frequently cannot support the full shading complexity allowed by the authoring application. Our goal is to design a neural representation for 3D assets with complex shading that supports full relightability and full integration into existing renderers. We provide an end-to-end shading solution at the first intersection of a ray with the underlying geometry. All shading and scattering is precomputed and included in the neural asset; no multiple scattering paths need to be traced, and no complex shading models need to be implemented to render our assets, beyond a single neural architecture. We combine an MLP decoder with a feature grid. Shading consists of querying a feature vector, followed by an MLP evaluation producing the final reflectance value. Our method provides high-fidelity shading, close to the ground-truth Monte Carlo estimate even at close-up views. We believe our neural assets could be used in practical renderers, providing significant speed-ups and simplifying renderer implementations.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonho Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh
We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.
{"title":"Speed-Aware Audio-Driven Speech Animation using Adaptive Windows","authors":"Sunjin Jung, Yeongho Seol, Kwanggyoon Seo, Hyeonho Na, Seonghyeon Kim, Vanessa Tan, Junyong Noh","doi":"10.1145/3691341","DOIUrl":"https://doi.org/10.1145/3691341","url":null,"abstract":"We present a novel method that can generate realistic speech animations of a 3D face from audio using multiple adaptive windows. In contrast to previous studies that use a fixed size audio window, our method accepts an adaptive audio window as input, reflecting the audio speaking rate to use consistent phonemic information. Our system consists of three parts. First, the speaking rate is estimated from the input audio using a neural network trained in a self-supervised manner. Second, the appropriate window size that encloses the audio features is predicted adaptively based on the estimated speaking rate. Another key element lies in the use of multiple audio windows of different sizes as input to the animation generator: a small window to concentrate on detailed information and a large window to consider broad phonemic information near the center frame. Finally, the speech animation is generated from the multiple adaptive audio windows. Our method can generate realistic speech animations from in-the-wild audios at any speaking rate, i.e., fast raps, slow songs, as well as normal speech. We demonstrate via extensive quantitative and qualitative evaluations including a user study that our method outperforms state-of-the-art approaches.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur
Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials that could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space optimization methods, and we carefully validate our diffusion process design choices. 1
{"title":"ControlMat: A Controlled Generative Approach to Material Capture","authors":"Giuseppe Vecchio, Rosalie Martin, Arthur Roullier, Adrien Kaiser, Romain Rouffet, Valentin Deschaintre, Tamy Boubekeur","doi":"10.1145/3688830","DOIUrl":"https://doi.org/10.1145/3688830","url":null,"abstract":"Material reconstruction from a photograph is a key component of 3D content creation democratization. We propose to formulate this ill-posed problem as a controlled synthesis one, leveraging the recent progress in generative deep networks. We present ControlMat, a method which, given a single photograph with uncontrolled illumination as input, conditions a diffusion model to generate plausible, tileable, high-resolution physically-based digital materials. We carefully analyze the behavior of diffusion models for multi-channel outputs, adapt the sampling process to fuse multi-scale information and introduce rolled diffusion to enable both tileability and patched diffusion for high-resolution outputs. Our generative approach further permits exploration of a variety of materials that could correspond to the input image, mitigating the unknown lighting conditions. We show that our approach outperforms recent inference and latent-space optimization methods, and we carefully validate our diffusion process design choices. <jats:xref ref-type=\"fn\"> <jats:sup>1</jats:sup> </jats:xref>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":6.2,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142374646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gonzalo Gomez-Nogales, Melania Prieto-Martin, Cristian Romero, Marc Comino-Trinidad, Pablo Ramon-Prieto, A. Olivier, Ludovic Hoyet, Miguel Otaduy, J. Pettré, Dan Casas
We propose a novel contact-aware method to synthesize highly-dense 3D crowds of animated characters. Existing methods animate crowds by, first, computing the 2D global motion approximating subjects as 2D particles and, then, introducing individual character motions without considering their surroundings. This creates the illusion of a 3D crowd, but, with density, characters frequently intersect each other since character-to-character contact is not modeled. We tackle this issue and propose a general method that considers any crowd animation and resolves existing residual collisions. To this end, we take a physics-based approach to model contacts between articulated characters. This enables the real-time synthesis of 3D high-density crowds with dozens of individuals that do not intersect each other, producing an unprecedented level of physical correctness in animations. Under the hood, we model each individual using a parametric human body incorporating a set of 3D proxies to approximate their volume. We then build a large system of articulated rigid bodies, and use an efficient physics-based approach to solve for individual body poses that do not collide with each other while maintaining the overall motion of the crowd. We first validate our approach objectively and quantitatively. We then explore relations between physical correctness and perceived realism based on an extensive user study that evaluates the relevance of solving contacts in dense crowds. Results demonstrate that our approach outperforms existing methods for crowd animation in terms of geometric accuracy and overall realism.
{"title":"Resolving Collisions in Dense 3D Crowd Animations","authors":"Gonzalo Gomez-Nogales, Melania Prieto-Martin, Cristian Romero, Marc Comino-Trinidad, Pablo Ramon-Prieto, A. Olivier, Ludovic Hoyet, Miguel Otaduy, J. Pettré, Dan Casas","doi":"10.1145/3687266","DOIUrl":"https://doi.org/10.1145/3687266","url":null,"abstract":"We propose a novel contact-aware method to synthesize highly-dense 3D crowds of animated characters. Existing methods animate crowds by, first, computing the 2D global motion approximating subjects as 2D particles and, then, introducing individual character motions without considering their surroundings. This creates the illusion of a 3D crowd, but, with density, characters frequently intersect each other since character-to-character contact is not modeled. We tackle this issue and propose a general method that considers any crowd animation and resolves existing residual collisions. To this end, we take a physics-based approach to model contacts between articulated characters. This enables the real-time synthesis of 3D high-density crowds with dozens of individuals that do not intersect each other, producing an unprecedented level of physical correctness in animations. Under the hood, we model each individual using a parametric human body incorporating a set of 3D proxies to approximate their volume. We then build a large system of articulated rigid bodies, and use an efficient physics-based approach to solve for individual body poses that do not collide with each other while maintaining the overall motion of the crowd. We first validate our approach objectively and quantitatively. We then explore relations between physical correctness and perceived realism based on an extensive user study that evaluates the relevance of solving contacts in dense crowds. Results demonstrate that our approach outperforms existing methods for crowd animation in terms of geometric accuracy and overall realism.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The efficiency of inverse optimization in physically based differentiable rendering heavily depends on the variance of Monte Carlo estimation. Despite recent advancements emphasizing the necessity of tailored differential sampling strategies, the general approaches remain unexplored. In this paper, we investigate the interplay between local sampling decisions and the estimation of light path derivatives. Considering that modern differentiable rendering algorithms share the same path for estimating differential radiance and ordinary radiance, we demonstrate that conventional guiding approaches, conditioned solely on the last vertex, cannot attain this density. Instead, a mixture of different sampling distributions is required, where the weights are conditioned on all the previously sampled vertices in the path. To embody our theory, we implement a conditional mixture path guiding that explicitly computes optimal weights on the fly. Furthermore, we show how to perform positivization to eliminate sign variance and extend to scenes with millions of parameters. To the best of our knowledge, this is the first generic framework for applying path guiding to differentiable rendering. Extensive experiments demonstrate that our method achieves nearly one order of magnitude improvements over state-of-the-art methods in terms of variance reduction in gradient estimation and errors of inverse optimization. The implementation of our proposed method is available at https://github.com/mollnn/conditional-mixture.
{"title":"Conditional Mixture Path Guiding for Differentiable Rendering","authors":"Zhimin Fan, Pengcheng Shi, Mufan Guo, Ruoyu Fu, Yanwen Guo, Jie Guo","doi":"10.1145/3658133","DOIUrl":"https://doi.org/10.1145/3658133","url":null,"abstract":"The efficiency of inverse optimization in physically based differentiable rendering heavily depends on the variance of Monte Carlo estimation. Despite recent advancements emphasizing the necessity of tailored differential sampling strategies, the general approaches remain unexplored.\u0000 In this paper, we investigate the interplay between local sampling decisions and the estimation of light path derivatives. Considering that modern differentiable rendering algorithms share the same path for estimating differential radiance and ordinary radiance, we demonstrate that conventional guiding approaches, conditioned solely on the last vertex, cannot attain this density. Instead, a mixture of different sampling distributions is required, where the weights are conditioned on all the previously sampled vertices in the path. To embody our theory, we implement a conditional mixture path guiding that explicitly computes optimal weights on the fly. Furthermore, we show how to perform positivization to eliminate sign variance and extend to scenes with millions of parameters.\u0000 To the best of our knowledge, this is the first generic framework for applying path guiding to differentiable rendering. Extensive experiments demonstrate that our method achieves nearly one order of magnitude improvements over state-of-the-art methods in terms of variance reduction in gradient estimation and errors of inverse optimization. The implementation of our proposed method is available at https://github.com/mollnn/conditional-mixture.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141820899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Lucas, Mickaël Ribardière, R. Pacanowski, Pascal Barla
We introduce an improved version of the micrograin BSDF model [Lucas et al. 2023] for the rendering of anisotropic porous layers. Our approach leverages the properties of micrograins to take into account the correlation between their height and normal, as well as the correlation between the light and view directions. This allows us to derive an exact analytical expression for the Geometrical Attenuation Factor (GAF), summarizing shadowing and masking inside the porous layer. This fully-correlated GAF is then used to define appropriate mixing weights to blend the BSDFs of the porous and base layers. Furthermore, by generalizing the micrograins shape to anisotropy, combined with their fully-correlated GAF, our improved BSDF model produces effects specific to porous layers such as retro-reflection visible on dust layers at grazing angles or height and color correlation that can be found on rusty materials. Finally, we demonstrate very close matches between our BSDF model and light transport simulations realized with explicit instances of micrograins, thus validating our model.
{"title":"A Fully-correlated Anisotropic Micrograin BSDF Model","authors":"Simon Lucas, Mickaël Ribardière, R. Pacanowski, Pascal Barla","doi":"10.1145/3658224","DOIUrl":"https://doi.org/10.1145/3658224","url":null,"abstract":"We introduce an improved version of the micrograin BSDF model [Lucas et al. 2023] for the rendering of anisotropic porous layers. Our approach leverages the properties of micrograins to take into account the correlation between their height and normal, as well as the correlation between the light and view directions. This allows us to derive an exact analytical expression for the Geometrical Attenuation Factor (GAF), summarizing shadowing and masking inside the porous layer. This fully-correlated GAF is then used to define appropriate mixing weights to blend the BSDFs of the porous and base layers. Furthermore, by generalizing the micrograins shape to anisotropy, combined with their fully-correlated GAF, our improved BSDF model produces effects specific to porous layers such as retro-reflection visible on dust layers at grazing angles or height and color correlation that can be found on rusty materials. Finally, we demonstrate very close matches between our BSDF model and light transport simulations realized with explicit instances of micrograins, thus validating our model.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141821041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}