Juntian Ye, Yu Hong, Xiongfei Su, Xin Yuan, Feihu Xu
Non-line-of-sight (NLOS) imaging has the ability to recover 3D images of scenes outside the direct line of sight, which is of growing interest for diverse applications. Despite the remarkable progress, NLOS imaging of dynamic objects is still challenging. It requires a large amount of multibounce photons for the reconstruction of single frame data. To overcome this obstacle, we develop a computational framework for dynamic time-of-flight NLOS imaging based on plug-and-play (PnP) algorithms. By combining imaging forward model with the deep denoising network from the computer vision community, we show a 4 frames-per-second (fps) 3D NLOS video recovery (128 × 128 × 512) in post processing. Our method leverages the temporal similarity among adjacent frames and incorporates sparse priors and frequency filtering. This enables higher-quality reconstructions for complex scenes. Extensive experiments are conducted to verify the superior performance of our proposed algorithm both through simulations and real data.
{"title":"Plug-and-Play Algorithms for Dynamic Non-line-of-sight Imaging","authors":"Juntian Ye, Yu Hong, Xiongfei Su, Xin Yuan, Feihu Xu","doi":"10.1145/3665139","DOIUrl":"https://doi.org/10.1145/3665139","url":null,"abstract":"<p>Non-line-of-sight (NLOS) imaging has the ability to recover 3D images of scenes outside the direct line of sight, which is of growing interest for diverse applications. Despite the remarkable progress, NLOS imaging of dynamic objects is still challenging. It requires a large amount of multibounce photons for the reconstruction of single frame data. To overcome this obstacle, we develop a computational framework for dynamic time-of-flight NLOS imaging based on plug-and-play (PnP) algorithms. By combining imaging forward model with the deep denoising network from the computer vision community, we show a 4 frames-per-second (fps) 3D NLOS video recovery (128 × 128 × 512) in post processing. Our method leverages the temporal similarity among adjacent frames and incorporates sparse priors and frequency filtering. This enables higher-quality reconstructions for complex scenes. Extensive experiments are conducted to verify the superior performance of our proposed algorithm both through simulations and real data.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"32 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140919527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lukas Lipp, David Hahn, Pierre Ecormier-Nocca, Florian Rist, Michael Wimmer
Differentiable rendering methods promise the ability to optimize various parameters of 3d scenes to achieve a desired result. However, lighting design has so far received little attention in this field. In this paper, we introduce a method that enables continuous optimization of the arrangement of luminaires in a 3d scene via differentiable light tracing. Our experiments show two major issues when attempting to apply existing methods from differentiable path tracing to this problem: first, many rendering methods produce images, which restricts the ability of a designer to define lighting objectives to image space. Second, most previous methods are designed for scene geometry or material optimization and have not been extensively tested for the case of optimizing light sources. Currently available differentiable ray-tracing methods do not provide satisfactory performance, even on fairly basic test cases in our experience. In this paper, we propose a novel adjoint light tracing method that overcomes these challenges and enables gradient-based lighting design optimization in a view-independent (camera-free) way. Thus, we allow the user to paint illumination targets directly onto the 3d scene or use existing baked illumination data (e.g., light maps). Using modern ray-tracing hardware, we achieve interactive performance. We find light tracing advantageous over path tracing in this setting, as it naturally handles irregular geometry, resulting in less noise and improved optimization convergence. We compare our adjoint gradients to state-of-the-art image-based differentiable rendering methods. We also demonstrate that our gradient data works with various common optimization algorithms, providing good convergence behaviour. Qualitative comparisons with real-world scenes underline the practical applicability of our method.
{"title":"View-Independent Adjoint Light Tracing for Lighting Design Optimization","authors":"Lukas Lipp, David Hahn, Pierre Ecormier-Nocca, Florian Rist, Michael Wimmer","doi":"10.1145/3662180","DOIUrl":"https://doi.org/10.1145/3662180","url":null,"abstract":"<p>Differentiable rendering methods promise the ability to optimize various parameters of 3d scenes to achieve a desired result. However, lighting design has so far received little attention in this field. In this paper, we introduce a method that enables continuous optimization of the arrangement of luminaires in a 3d scene via differentiable light tracing. Our experiments show two major issues when attempting to apply existing methods from differentiable path tracing to this problem: first, many rendering methods produce images, which restricts the ability of a designer to define lighting objectives to image space. Second, most previous methods are designed for scene geometry or material optimization and have not been extensively tested for the case of optimizing light sources. Currently available differentiable ray-tracing methods do not provide satisfactory performance, even on fairly basic test cases in our experience. In this paper, we propose a novel adjoint light tracing method that overcomes these challenges and enables gradient-based lighting design optimization in a view-independent (camera-free) way. Thus, we allow the user to paint illumination targets directly onto the 3d scene or use existing baked illumination data (e.g., light maps). Using modern ray-tracing hardware, we achieve interactive performance. We find light tracing advantageous over path tracing in this setting, as it naturally handles irregular geometry, resulting in less noise and improved optimization convergence. We compare our adjoint gradients to state-of-the-art image-based differentiable rendering methods. We also demonstrate that our gradient data works with various common optimization algorithms, providing good convergence behaviour. Qualitative comparisons with real-world scenes underline the practical applicability of our method.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"7 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mesh processing algorithms are often communicated via concise mathematical notation (e.g., summation over mesh neighborhoods). However, conversion of notation into working code remains a time consuming and error-prone process which requires arcane knowledge of low-level data structures and libraries—impeding rapid exploration of high-level algorithms. We address this problem by introducing a domain-specific language (DSL) for mesh processing called I❤MESH, which resembles notation commonly used in visual and geometric computing, and automates the process of converting notation into code. The centerpiece of our language is a flexible notation for specifying and manipulating neighborhoods of a cell complex, internally represented via standard operations on sparse boundary matrices. This layered design enables natural expression of algorithms while minimizing demands on a code generation back-end. In particular, by integrating I❤MESH with the linear algebra features of the I❤LA DSL, and adding support for automatic differentiation, we can rapidly implement a rich variety of algorithms on point clouds, surface meshes, and volume meshes.
{"title":"I❤MESH: A DSL for Mesh Processing","authors":"Yong Li, Shoaib Kamil, Keenan Crane, Alec Jacobson, Yotam Gingold","doi":"10.1145/3662181","DOIUrl":"https://doi.org/10.1145/3662181","url":null,"abstract":"<p>Mesh processing algorithms are often communicated via concise mathematical notation (e.g., summation over mesh neighborhoods). However, conversion of notation into working code remains a time consuming and error-prone process which requires arcane knowledge of low-level data structures and libraries—impeding rapid exploration of high-level algorithms. We address this problem by introducing a domain-specific language (DSL) for mesh processing called I❤MESH, which resembles notation commonly used in visual and geometric computing, and automates the process of converting notation into code. The centerpiece of our language is a flexible notation for specifying and manipulating neighborhoods of a cell complex, internally represented via standard operations on sparse boundary matrices. This layered design enables natural expression of algorithms while minimizing demands on a code generation back-end. In particular, by integrating I❤MESH with the linear algebra features of the I❤LA DSL, and adding support for automatic differentiation, we can rapidly implement a rich variety of algorithms on point clouds, surface meshes, and volume meshes.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"4 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140817849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter
This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field.
The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around (-0.5). Based on the challenge results we formulate numerous recommendations for system building and evaluation.
本文报告了第二届 GENEA 挑战赛的情况,该挑战赛旨在对数据驱动的自动协同语音手势生成进行基准测试。参赛团队使用相同的语音和动作数据集构建手势生成系统。所有这些系统生成的动作都使用标准化的可视化管道渲染成视频,并在几个大型的众包用户研究中进行评估。与比较不同的研究论文不同,这里的结果差异只是由于方法的不同,因此可以直接比较不同的系统。数据集基于 18 个小时的全身动作捕捉,包括手指,捕捉的对象是正在进行二人对话的不同人。十支团队参加了两个级别的挑战赛:全身和上半身手势。对于每个级别,我们既要评估手势动作与人类的相似性,又要评估其是否适合特定的语音信号。我们的评估将与人类的相似性和手势的适当性分离开来,这一直是该领域的一个难题。评估结果表明,某些合成手势比三维人体动作捕捉更像人。据我们所知,这种情况以前从未出现过。另一方面,我们发现所有的合成动作都远不如原始动作捕捉记录更适合语音。我们还发现,在这次大规模的评估中,传统的客观指标与主观的人类相似度评级并没有很好的相关性。唯一的例外是弗雷谢特手势距离(FGD),它的 Kendall's tau 等级相关性约为(-0.5)。基于挑战结果,我们为系统建设和评估提出了许多建议。
{"title":"Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022","authors":"Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter","doi":"10.1145/3656374","DOIUrl":"https://doi.org/10.1145/3656374","url":null,"abstract":"<p>This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. </p><p>The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around (-0.5). Based on the challenge results we formulate numerous recommendations for system building and evaluation.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"9 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code.
We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations.
{"title":"Differentiable solver for time-dependent deformation problems with contact","authors":"Zizhou Huang, Davi Colli Tozoni, Arvi Gjoka, Zachary Ferguson, Teseo Schneider, Daniele Panozzo, Denis Zorin","doi":"10.1145/3657648","DOIUrl":"https://doi.org/10.1145/3657648","url":null,"abstract":"<p>We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code. </p><p>We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"8 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn
We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations.
Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation.
By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.
{"title":"Real-Time Neural Appearance Models","authors":"Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn","doi":"10.1145/3659577","DOIUrl":"https://doi.org/10.1145/3659577","url":null,"abstract":"<p>We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations. </p><p>Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation. </p><p>By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"16 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or
Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative text-to-image generation, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of “prior constraints”. To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.
{"title":"ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints","authors":"Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or","doi":"10.1145/3659578","DOIUrl":"https://doi.org/10.1145/3659578","url":null,"abstract":"<p>Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a <i>new</i>, imaginary concept that has never been seen before? In this paper, we present the task of <i>creative text-to-image generation</i>, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of “prior constraints”. To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"25 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140557143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haipeng Li, Hai Jiang, Ao Luo, Ping Tan, Haoqiang Fan, Bing Zeng, Shuaicheng Liu
Supervised homography estimation methods face a challenge due to the lack of adequate labeled training data. To address this issue, we propose DMHomo, a diffusion model-based framework for supervised homography learning. This framework generates image pairs with accurate labels, realistic image content, and realistic interval motion, ensuring they satisfy adequate pairs. We utilize unlabeled image pairs with pseudo-labels such as homography and dominant plane masks, computed from existing methods, to train a diffusion model that generates a supervised training dataset. To further enhance performance, we introduce a new probabilistic mask loss, which identifies outlier regions through supervised training, and an iterative mechanism to optimize the generative and homography models successively. Our experimental results demonstrate that DMHomo effectively overcomes the scarcity of qualified datasets in supervised homography learning and improves generalization to real-world scenes. The code and dataset are available at: https://github.com/lhaippp/DMHomo
{"title":"DMHomo: Learning Homography with Diffusion Models","authors":"Haipeng Li, Hai Jiang, Ao Luo, Ping Tan, Haoqiang Fan, Bing Zeng, Shuaicheng Liu","doi":"10.1145/3652207","DOIUrl":"https://doi.org/10.1145/3652207","url":null,"abstract":"<p>Supervised homography estimation methods face a challenge due to the lack of adequate labeled training data. To address this issue, we propose DMHomo, a diffusion model-based framework for supervised homography learning. This framework generates image pairs with accurate labels, realistic image content, and realistic interval motion, ensuring they satisfy adequate pairs. We utilize unlabeled image pairs with pseudo-labels such as homography and dominant plane masks, computed from existing methods, to train a diffusion model that generates a supervised training dataset. To further enhance performance, we introduce a new probabilistic mask loss, which identifies outlier regions through supervised training, and an iterative mechanism to optimize the generative and homography models successively. Our experimental results demonstrate that DMHomo effectively overcomes the scarcity of qualified datasets in supervised homography learning and improves generalization to real-world scenes. The code and dataset are available at: https://github.com/lhaippp/DMHomo</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"107 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140096891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To alleviate human labor in redrawing keyframes with ordered vector strokes for automatic inbetweening, we for the first time propose a joint stroke tracing and correspondence approach. Given consecutive raster keyframes along with a single vector image of the starting frame as a guidance, the approach generates vector drawings for the remaining keyframes while ensuring one-to-one stroke correspondence. Our framework trained on clean line drawings generalizes to rough sketches and the generated results can be imported into inbetweening systems to produce inbetween sequences. Hence, the method is compatible with standard 2D animation workflow. An adaptive spatial transformation module (ASTM) is introduced to handle non-rigid motions and stroke distortion. We collect a dataset for training, with 10k+ pairs of raster frames and their vector drawings with stroke correspondence. Comprehensive validations on real clean and rough animated frames manifest the effectiveness of our method and superiority to existing methods.
{"title":"Joint Stroke Tracing and Correspondence for 2D Animation","authors":"Haoran Mo, Chengying Gao, Ruomei Wang","doi":"10.1145/3649890","DOIUrl":"https://doi.org/10.1145/3649890","url":null,"abstract":"<p>To alleviate human labor in redrawing keyframes with ordered vector strokes for automatic inbetweening, we for the first time propose a joint stroke tracing and correspondence approach. Given consecutive raster keyframes along with a single vector image of the starting frame as a guidance, the approach generates vector drawings for the remaining keyframes while ensuring one-to-one stroke correspondence. Our framework trained on clean line drawings generalizes to rough sketches and the generated results can be imported into inbetweening systems to produce inbetween sequences. Hence, the method is compatible with standard 2D animation workflow. An adaptive spatial transformation module (ASTM) is introduced to handle non-rigid motions and stroke distortion. We collect a dataset for training, with 10k+ pairs of raster frames and their vector drawings with stroke correspondence. Comprehensive validations on real clean and rough animated frames manifest the effectiveness of our method and superiority to existing methods.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"52 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139994136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shusen Liu, Xiaowei He, Yuzhong Guo, Yue Chang, Wencheng Wang
Tensile instability is one of the major obstacles to particle methods in fluid simulation, which would cause particles to clump in pairs under tension and prevent fluid simulation to generate small-scale thin features. To address this issue, previous particle methods either use a background pressure or a finite difference scheme to alleviate the particle clustering artifacts, yet still fail to produce small-scale thin features in free-surface flows. In this paper, we propose a dual-particle approach for simulating incompressible fluids. Our approach involves incorporating supplementary virtual particles designed to capture and store particle pressures. These pressure samples undergo systematic redistribution at each time step, grounded in the initial positions of the fluid particles. By doing so, we effectively reduce tensile instability in standard SPH by narrowing down the unstable regions for particles experiencing tensile stress. As a result, we can accurately simulate free-surface flows with rich small-scale thin features, such as droplets, streamlines, and sheets, as demonstrated by experimental results.
{"title":"A Dual-Particle Approach for Incompressible SPH Fluids","authors":"Shusen Liu, Xiaowei He, Yuzhong Guo, Yue Chang, Wencheng Wang","doi":"10.1145/3649888","DOIUrl":"https://doi.org/10.1145/3649888","url":null,"abstract":"<p>Tensile instability is one of the major obstacles to particle methods in fluid simulation, which would cause particles to clump in pairs under tension and prevent fluid simulation to generate small-scale thin features. To address this issue, previous particle methods either use a background pressure or a finite difference scheme to alleviate the particle clustering artifacts, yet still fail to produce small-scale thin features in free-surface flows. In this paper, we propose a dual-particle approach for simulating incompressible fluids. Our approach involves incorporating supplementary virtual particles designed to capture and store particle pressures. These pressure samples undergo systematic redistribution at each time step, grounded in the initial positions of the fluid particles. By doing so, we effectively reduce tensile instability in standard SPH by narrowing down the unstable regions for particles experiencing tensile stress. As a result, we can accurately simulate free-surface flows with rich small-scale thin features, such as droplets, streamlines, and sheets, as demonstrated by experimental results.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"48 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139994048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}