Zhongjin Luo, Haolin Liu, Chenghong Li, Wanghao Du, Zirong Jin, Wanhu Sun, Yinyu Nie, Weikai Chen, Xiaoguang Han
Neural implicit functions have brought impressive advances to the state-of-the-art of clothed human digitization from multiple or even single images. However, despite the progress, current arts still have difficulty generalizing to unseen images with complex cloth deformation and body poses. In this work, we present GarVerseLOD, a new dataset and framework that paves the way to achieving unprecedented robustness in high-fidelity 3D garment reconstruction from a single unconstrained image. Inspired by the recent success of large generative models, we believe that one key to addressing the generalization challenge lies in the quantity and quality of 3D garment data. Towards this end, GarVerseLOD collects 6,000 high-quality cloth models with fine-grained geometry details manually created by professional artists. In addition to the scale of training data, we observe that having disentangled granularities of geometry can play an important role in boosting the generalization capability and inference accuracy of the learned model. We hence craft GarVerseLOD as a hierarchical dataset with levels of details (LOD) , spanning from detail-free stylized shape to pose-blended garment with pixel-aligned details. This allows us to make this highly under-constrained problem tractable by factorizing the inference into easier tasks, each narrowed down with smaller searching space. To ensure GarVerseLOD can generalize well to in-the-wild images, we propose a novel labeling paradigm based on conditional diffusion models to generate extensive paired images for each garment model with high photorealism. We evaluate our method on a massive amount of in-the-wild images. Experimental results demonstrate that GarVerseLOD can generate standalone garment pieces with significantly better quality than prior approaches while being robust against a large variation of pose, illumination, occlusion, and deformation. Code and dataset are available at garverselod.github.io.
{"title":"GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details","authors":"Zhongjin Luo, Haolin Liu, Chenghong Li, Wanghao Du, Zirong Jin, Wanhu Sun, Yinyu Nie, Weikai Chen, Xiaoguang Han","doi":"10.1145/3687921","DOIUrl":"https://doi.org/10.1145/3687921","url":null,"abstract":"Neural implicit functions have brought impressive advances to the state-of-the-art of clothed human digitization from multiple or even single images. However, despite the progress, current arts still have difficulty generalizing to unseen images with complex cloth deformation and body poses. In this work, we present GarVerseLOD, a new dataset and framework that paves the way to achieving unprecedented robustness in high-fidelity 3D garment reconstruction from a single unconstrained image. Inspired by the recent success of large generative models, we believe that one key to addressing the generalization challenge lies in the quantity and quality of 3D garment data. Towards this end, GarVerseLOD collects 6,000 high-quality cloth models with fine-grained geometry details manually created by professional artists. In addition to the scale of training data, we observe that having disentangled granularities of geometry can play an important role in boosting the generalization capability and inference accuracy of the learned model. We hence craft GarVerseLOD as a hierarchical dataset with <jats:italic>levels of details (LOD)</jats:italic> , spanning from detail-free stylized shape to pose-blended garment with pixel-aligned details. This allows us to make this highly under-constrained problem tractable by factorizing the inference into easier tasks, each narrowed down with smaller searching space. To ensure GarVerseLOD can generalize well to in-the-wild images, we propose a novel labeling paradigm based on conditional diffusion models to generate extensive paired images for each garment model with high photorealism. We evaluate our method on a massive amount of in-the-wild images. Experimental results demonstrate that GarVerseLOD can generate standalone garment pieces with significantly better quality than prior approaches while being robust against a large variation of pose, illumination, occlusion, and deformation. Code and dataset are available at garverselod.github.io.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"80 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While there have been previous works that explored methods to enhance the aesthetics of images, the automated beautification of 3D shapes has been limited to specific shapes such as 3D face models. In this paper, we introduce a framework to automatically enhance the aesthetics of general 3D shapes. Our approach employs a reference-based beautification strategy. We first performed data collection to gather the aesthetics ratings of various 3D shapes to create a 3D shape aesthetics dataset. Then we perform reference-based editing to edit the input shape and beautify it by making it look more like some reference shape that is aesthetic. Specifically, we propose a reference-guided global deformation framework to coherently deform the input shape such that its structural proportions will be closer to those of the reference shape. We then optionally transplant some local aesthetic parts from the reference to the input to obtain the beautified output shapes. Comparisons show that our reference-guided 3D deformation algorithm outperforms existing techniques. Furthermore, quantitative and qualitative evaluations demonstrate that the performance of our aesthetics enhancement framework is consistent with both human perception and existing 3D shape aesthetics assessment.
{"title":"Enhancing the Aesthetics of 3D Shapes via Reference-based Editing","authors":"Minchan Chen, Manfred Lau","doi":"10.1145/3687954","DOIUrl":"https://doi.org/10.1145/3687954","url":null,"abstract":"While there have been previous works that explored methods to enhance the aesthetics of images, the automated beautification of 3D shapes has been limited to specific shapes such as 3D face models. In this paper, we introduce a framework to automatically enhance the aesthetics of general 3D shapes. Our approach employs a reference-based beautification strategy. We first performed data collection to gather the aesthetics ratings of various 3D shapes to create a 3D shape aesthetics dataset. Then we perform reference-based editing to edit the input shape and beautify it by making it look more like some reference shape that is aesthetic. Specifically, we propose a reference-guided global deformation framework to coherently deform the input shape such that its structural proportions will be closer to those of the reference shape. We then optionally transplant some local aesthetic parts from the reference to the input to obtain the beautified output shapes. Comparisons show that our reference-guided 3D deformation algorithm outperforms existing techniques. Furthermore, quantitative and qualitative evaluations demonstrate that the performance of our aesthetics enhancement framework is consistent with both human perception and existing 3D shape aesthetics assessment.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"176 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bastien Doignies, David Coeurjolly, Nicolas Bonneel, Julie Digne, Jean-Claude Iehl, Victor Ostromoukhov
Quasi-Monte Carlo integration is at the core of rendering. This technique estimates the value of an integral by evaluating the integrand at well-chosen sample locations. These sample points are designed to cover the domain as uniformly as possible to achieve better convergence rates than purely random points. Deterministic low-discrepancy sequences have been shown to outperform many competitors by guaranteeing good uniformity as measured by the so-called discrepancy metric, and, indirectly, by an integer t value relating the number of points falling into each domain stratum with the stratum area (lower t is better). To achieve randomness, scrambling techniques produce multiple realizations preserving the t value, making the construction stochastic. Among them, Owen scrambling is a popular approach that recursively permutes intervals for each dimension. However, relying on permutation trees makes it incompatible with smooth optimization frameworks. We present a differentiable Owen scrambling that regularizes permutations. We show that it can effectively be used with automatic differentiation tools for optimizing low-discrepancy sequences to improve metrics such as optimal transport uniformity, integration error, designed power spectra or projective properties, while maintaining their initial t -value as guaranteed by Owen scrambling. In some rendering settings, we show that our optimized sequences improve the rendering error.
准蒙特卡罗积分是渲染的核心。这种技术通过在精心选择的样本位置对积分进行求值来估算积分值。这些采样点的设计目的是尽可能均匀地覆盖整个域,以达到比纯随机点更好的收敛速度。确定性低差异序列已被证明优于许多竞争者,它通过所谓的差异度量保证良好的均匀性,并间接地通过一个整数 t 值(t 值越小越好)来衡量落入每个域分层的点数与分层面积的关系。为了实现随机性,扰频技术会产生多个保留 t 值的实现值,从而使构造具有随机性。其中,欧文扰频是一种流行的方法,它对每个维度的区间进行递归置换。然而,依赖于置换树使其与平滑优化框架不兼容。我们提出了一种正则化排列的可微分欧文扰乱法。我们证明,它可以有效地与自动微分工具一起用于优化低差异序列,以改善最优传输均匀性、积分误差、设计功率谱或投影特性等指标,同时保持欧文扰频所保证的初始 t 值。在某些渲染设置中,我们的优化序列改善了渲染误差。
{"title":"Differentiable Owen Scrambling","authors":"Bastien Doignies, David Coeurjolly, Nicolas Bonneel, Julie Digne, Jean-Claude Iehl, Victor Ostromoukhov","doi":"10.1145/3687764","DOIUrl":"https://doi.org/10.1145/3687764","url":null,"abstract":"Quasi-Monte Carlo integration is at the core of rendering. This technique estimates the value of an integral by evaluating the integrand at well-chosen sample locations. These sample points are designed to cover the domain as uniformly as possible to achieve better convergence rates than purely random points. Deterministic low-discrepancy sequences have been shown to outperform many competitors by guaranteeing good uniformity as measured by the so-called discrepancy metric, and, indirectly, by an integer <jats:italic>t</jats:italic> value relating the number of points falling into each domain stratum with the stratum area (lower <jats:italic>t</jats:italic> is better). To achieve randomness, scrambling techniques produce multiple realizations preserving the <jats:italic>t</jats:italic> value, making the construction stochastic. Among them, Owen scrambling is a popular approach that recursively permutes intervals for each dimension. However, relying on permutation trees makes it incompatible with smooth optimization frameworks. We present a differentiable Owen scrambling that regularizes permutations. We show that it can effectively be used with automatic differentiation tools for optimizing low-discrepancy sequences to improve metrics such as optimal transport uniformity, integration error, designed power spectra or projective properties, while maintaining their initial <jats:italic>t</jats:italic> -value as guaranteed by Owen scrambling. In some rendering settings, we show that our optimized sequences improve the rendering error.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"22 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the primary reasons for the high cost of traditional animation is the inbetweening process, where artists manually draw each intermediate frame necessary for smooth motion. Making this process more efficient has been at the core of computer graphics research for years, yet the industry has adopted very few solutions. Most existing solutions either require vector input or resort to tight inbetweening; often, they attempt to fully automate the process. In industry, however, keyframes are often spaced far apart, drawn in raster format, and contain occlusions. Moreover, inbetweening is fundamentally an artistic process, so the artist should maintain high-level control over it. We address these issues by proposing a novel inbetweening system for bitmap character drawings, supporting both tight and far inbetweening. In our setup, the artist can control motion by animating a skeleton between the keyframe poses. Our system then performs skeleton-based deformation of the bitmap drawings into the same pose and employs discrete optimization and deep learning to blend the deformed images. Besides the skeleton and the two drawn bitmap keyframes, we require very little annotation. However, deforming drawings with occlusions is complex, as it requires a piecewise smooth deformation field. To address this, we observe that this deformation field is smooth when the drawing is lifted into 3D. Our system therefore optimizes topology of a 2.5D partially layered template that we use to lift the drawing into 3D and get the final piecewise-smooth deformaton, effectively resolving occlusions. We validate our system through a series of animations, qualitative and quantitative comparisons, and user studies, demonstrating that our approach consistently outperforms the state of the art and our results are consistent with the viewers' perception. Code and data for our paper are available at http://www-labs.iro.umontreal.ca/~bmpix/inbetweening/.
{"title":"Skeleton-Driven Inbetweening of Bitmap Character Drawings","authors":"Kirill Brodt, Mikhail Bessmeltsev","doi":"10.1145/3687955","DOIUrl":"https://doi.org/10.1145/3687955","url":null,"abstract":"One of the primary reasons for the high cost of traditional animation is the inbetweening process, where artists manually draw each intermediate frame necessary for smooth motion. Making this process more efficient has been at the core of computer graphics research for years, yet the industry has adopted very few solutions. Most existing solutions either require vector input or resort to tight inbetweening; often, they attempt to fully automate the process. In industry, however, keyframes are often spaced far apart, drawn in raster format, and contain occlusions. Moreover, inbetweening is fundamentally an artistic process, so the artist should maintain high-level control over it. We address these issues by proposing a novel inbetweening system for bitmap character drawings, supporting both <jats:italic>tight</jats:italic> and <jats:italic>far</jats:italic> inbetweening. In our setup, the artist can control motion by animating a skeleton between the keyframe poses. Our system then performs skeleton-based deformation of the bitmap drawings into the same pose and employs discrete optimization and deep learning to blend the deformed images. Besides the skeleton and the two drawn bitmap keyframes, we require very little annotation. However, deforming drawings with occlusions is complex, as it requires a piecewise smooth deformation field. To address this, we observe that this deformation field is smooth when the drawing is lifted into 3D. Our system therefore optimizes topology of a 2.5D partially layered template that we use to lift the drawing into 3D and get the final piecewise-smooth deformaton, effectively resolving occlusions. We validate our system through a series of animations, qualitative and quantitative comparisons, and user studies, demonstrating that our approach consistently outperforms the state of the art and our results are consistent with the viewers' perception. Code and data for our paper are available at http://www-labs.iro.umontreal.ca/~bmpix/inbetweening/.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"69 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guojin Huang, Qing Fang, Zheng Zhang, Ligang Liu, Xiao-Ming Fu
We propose a simple yet effective method to orient normals for point clouds. Central to our approach is a novel optimization objective function defined from global and local perspectives. Globally, we introduce a signed uncertainty function that distinguishes the inside and outside of the underlying surface. Moreover, benefiting from the statistics of our global term, we present a local orientation term instead of a global one. The optimization problem can be solved by the commonly used numerical optimization solver, such as L-BFGS. The capability and feasibility of our approach are demonstrated over various complex point clouds. We achieve higher practical robustness and normal quality than the state-of-the-art methods.
{"title":"Stochastic Normal Orientation for Point Clouds","authors":"Guojin Huang, Qing Fang, Zheng Zhang, Ligang Liu, Xiao-Ming Fu","doi":"10.1145/3687944","DOIUrl":"https://doi.org/10.1145/3687944","url":null,"abstract":"We propose a simple yet effective method to orient normals for point clouds. Central to our approach is a novel optimization objective function defined from global and local perspectives. Globally, we introduce a signed uncertainty function that distinguishes the inside and outside of the underlying surface. Moreover, benefiting from the statistics of our global term, we present a local orientation term instead of a global one. The optimization problem can be solved by the commonly used numerical optimization solver, such as L-BFGS. The capability and feasibility of our approach are demonstrated over various complex point clouds. We achieve higher practical robustness and normal quality than the state-of-the-art methods.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"70 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optimized parallel implementations on GPU or CPU have dramatically enhanced the fidelity, resolution and accuracy of physical simulations and mesh-based algorithms. However, attaining optimal performance requires expert knowledge and might demand complex code and memory layout optimizations. This adds to the fact that physical simulation algorithms require the implementation of derivatives, which can be a tedious and error-prone process. In recent years, researchers and practitioners have investigated the concept of designing systems that allow for a more expressive definition of mesh-based simulation code. These systems leverage domain-specific languages (DSL), automatic differentiation or symbolic computing to enhance readability of implementations without compromising performance. We follow this line of work and propose a symbolic code generation approach tailored to mesh-based computations on parallel devices. Our system extends related work by incorporating collision handling and a data access synchronization approach, enabling rapid sparse matrix assembly.
GPU 或 CPU 上经过优化的并行实施大大提高了物理模拟和基于网格算法的保真度、分辨率和精确度。然而,要达到最佳性能需要专业知识,还可能需要对代码和内存布局进行复杂的优化。此外,物理模拟算法还需要执行导数,这可能是一个繁琐且容易出错的过程。近年来,研究人员和从业人员对设计系统的概念进行了研究,这些系统允许对基于网格的仿真代码进行更具表现力的定义。这些系统利用特定领域语言 (DSL)、自动微分或符号计算来提高实现的可读性,同时又不影响性能。我们遵循这一工作路线,提出了一种为并行设备上基于网格的计算量身定制的符号代码生成方法。我们的系统扩展了相关工作,纳入了碰撞处理和数据访问同步方法,实现了稀疏矩阵的快速组装。
{"title":"A Mesh-based Simulation Framework using Automatic Code Generation","authors":"Philipp Herholz, Tuur Stuyck, Ladislav Kavan","doi":"10.1145/3687986","DOIUrl":"https://doi.org/10.1145/3687986","url":null,"abstract":"Optimized parallel implementations on GPU or CPU have dramatically enhanced the fidelity, resolution and accuracy of physical simulations and mesh-based algorithms. However, attaining optimal performance requires expert knowledge and might demand complex code and memory layout optimizations. This adds to the fact that physical simulation algorithms require the implementation of derivatives, which can be a tedious and error-prone process. In recent years, researchers and practitioners have investigated the concept of designing systems that allow for a more expressive definition of mesh-based simulation code. These systems leverage domain-specific languages (DSL), automatic differentiation or symbolic computing to enhance readability of implementations without compromising performance. We follow this line of work and propose a symbolic code generation approach tailored to mesh-based computations on parallel devices. Our system extends related work by incorporating collision handling and a data access synchronization approach, enabling rapid sparse matrix assembly.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Caigui Jiang, Dmitry Lyakhov, Florian Rist, Helmut Pottmann, Johannes Wallner
This paper provides computational tools for the modeling and design of quad mesh mechanisms, which are meshes allowing continuous flexions under the assumption of rigid faces and hinges in the edges. We combine methods and results from different areas, namely differential geometry of surfaces, rigidity and flexibility of bar and joint frameworks, algebraic geometry, and optimization. The basic idea to achieve a time-continuous flexion is time-discretization justified by an algebraic degree argument. We are able to prove computationally feasible bounds on the number of required time instances we need to incorporate in our optimization. For optimization to succeed, an informed initialization is crucial. We present two computational pipelines to achieve that: one based on remeshing isometric surface pairs, another one based on iterative refinement. A third manner of initialization proved very effective: We interactively design meshes which are close to a narrow known class of flexible meshes, but not contained in it. Having enjoyed sufficiently many degrees of freedom during design, we afterwards optimize towards flexibility.
{"title":"Quad mesh mechanisms","authors":"Caigui Jiang, Dmitry Lyakhov, Florian Rist, Helmut Pottmann, Johannes Wallner","doi":"10.1145/3687939","DOIUrl":"https://doi.org/10.1145/3687939","url":null,"abstract":"This paper provides computational tools for the modeling and design of quad mesh mechanisms, which are meshes allowing continuous flexions under the assumption of rigid faces and hinges in the edges. We combine methods and results from different areas, namely differential geometry of surfaces, rigidity and flexibility of bar and joint frameworks, algebraic geometry, and optimization. The basic idea to achieve a time-continuous flexion is time-discretization justified by an algebraic degree argument. We are able to prove computationally feasible bounds on the number of required time instances we need to incorporate in our optimization. For optimization to succeed, an informed initialization is crucial. We present two computational pipelines to achieve that: one based on remeshing isometric surface pairs, another one based on iterative refinement. A third manner of initialization proved very effective: We interactively design meshes which are close to a narrow known class of flexible meshes, but not contained in it. Having enjoyed sufficiently many degrees of freedom during design, we afterwards optimize towards flexibility.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"7 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. We utilize Interval Score Matching to address over-smoothness and over-saturation, in combination with a geometry-aware ControlNet that ensures consistent geometry throughout the generative process. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.
风格引导纹理生成的目的是,在给定参考风格图像和带有文字说明的三维网格的情况下,生成与参考图像的风格和输入网格的几何形状相协调的纹理。尽管基于扩散的三维纹理生成方法(如蒸馏采样)在风格化游戏和电影中有着大量的应用前景,但它需要解决两个难题:1) 将三维模型的风格和内容与参考图像完全分离;2) 使生成的纹理与参考图像的色调、风格和给定的文字提示相一致。为此,我们推出了基于扩散模型的创新框架 StyleTex,用于为 3D 模型创建风格化纹理。我们的主要见解是在基于扩散的蒸馏采样中,将风格信息与参考图像分离,同时忽略内容。具体来说,在给定参考图像的情况下,我们首先从图像 CLIP 嵌入中分解其风格特征,方法是减去嵌入在内容特征方向上的正交投影,内容特征由文本 CLIP 嵌入表示。我们采用新颖的方法来分离参考图像的风格和内容信息,从而生成不同的风格和内容特征。然后,我们将风格特征注入交叉关注机制,将其纳入生成过程,同时利用内容特征作为负面提示,进一步分离内容信息。最后,我们将这些策略整合到 StyleTex 中,从而获得风格化纹理。我们利用区间分数匹配来解决过度平滑和过度饱和的问题,并结合几何感知控制网来确保整个生成过程中几何形状的一致性。StyleTex 生成的纹理既保留了参考图像的风格,又与文字提示和给定 3D 网格的内在细节保持一致。定量和定性实验表明,我们的方法明显优于现有的基线方法。
{"title":"StyleTex: Style Image-Guided Texture Generation for 3D Models","authors":"Zhiyu Xie, Yuqing Zhang, Xiangjun Tang, Yiqian Wu, Dehan Chen, Gongsheng Li, Xiaogang Jin","doi":"10.1145/3687931","DOIUrl":"https://doi.org/10.1145/3687931","url":null,"abstract":"Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. We utilize Interval Score Matching to address over-smoothness and over-saturation, in combination with a geometry-aware ControlNet that ensures consistent geometry throughout the generative process. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"99 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods.
{"title":"GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting","authors":"Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian","doi":"10.1145/3687759","DOIUrl":"https://doi.org/10.1145/3687759","url":null,"abstract":"Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"33 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting, making a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360° views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) higher-quality dense point clouds are reconstructed via advanced methods for objects captured in 360 ° views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public at luyues.github.io/mvimgnet2 , including multi-view images of all 520k objects, the reconstructed high-quality point clouds, and data annotation codes, hoping to inspire the broader vision community.
{"title":"MVImgNet2.0: A Larger-scale Dataset of Multi-view Images","authors":"Yushuang Wu, Luyue Shi, Haolin Liu, Hongjie Liao, Lingteng Qiu, Weihao Yuan, Xiaodong Gu, Zilong Dong, Shuguang Cui, Xiaoguang Han","doi":"10.1145/3687973","DOIUrl":"https://doi.org/10.1145/3687973","url":null,"abstract":"MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting, making a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360° views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) higher-quality dense point clouds are reconstructed via advanced methods for objects captured in 360 <jats:sup>°</jats:sup> views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public at <jats:italic>luyues.github.io/mvimgnet2</jats:italic> , including multi-view images of all 520k objects, the reconstructed high-quality point clouds, and data annotation codes, hoping to inspire the broader vision community.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"80 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}