ACM Transactions on Graphics最新文献_第5页

GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details GarVerseLOD：利用具有细节层次的数据集，从单张野外图像重建高保真 3D 服装

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687921

Zhongjin Luo, Haolin Liu, Chenghong Li, Wanghao Du, Zirong Jin, Wanhu Sun, Yinyu Nie, Weikai Chen, Xiaoguang Han

Neural implicit functions have brought impressive advances to the state-of-the-art of clothed human digitization from multiple or even single images. However, despite the progress, current arts still have difficulty generalizing to unseen images with complex cloth deformation and body poses. In this work, we present GarVerseLOD, a new dataset and framework that paves the way to achieving unprecedented robustness in high-fidelity 3D garment reconstruction from a single unconstrained image. Inspired by the recent success of large generative models, we believe that one key to addressing the generalization challenge lies in the quantity and quality of 3D garment data. Towards this end, GarVerseLOD collects 6,000 high-quality cloth models with fine-grained geometry details manually created by professional artists. In addition to the scale of training data, we observe that having disentangled granularities of geometry can play an important role in boosting the generalization capability and inference accuracy of the learned model. We hence craft GarVerseLOD as a hierarchical dataset with levels of details (LOD) , spanning from detail-free stylized shape to pose-blended garment with pixel-aligned details. This allows us to make this highly under-constrained problem tractable by factorizing the inference into easier tasks, each narrowed down with smaller searching space. To ensure GarVerseLOD can generalize well to in-the-wild images, we propose a novel labeling paradigm based on conditional diffusion models to generate extensive paired images for each garment model with high photorealism. We evaluate our method on a massive amount of in-the-wild images. Experimental results demonstrate that GarVerseLOD can generate standalone garment pieces with significantly better quality than prior approaches while being robust against a large variation of pose, illumination, occlusion, and deformation. Code and dataset are available at garverselod.github.io.

神经隐函数为从多幅甚至单幅图像中进行服装人体数字化带来了令人印象深刻的进步。然而，尽管取得了进步，目前的技术仍然难以推广到具有复杂布料变形和身体姿势的未知图像。在这项工作中，我们提出了 GarVerseLOD，这是一个新的数据集和框架，它为从单张无约束图像重建高保真三维服装实现前所未有的鲁棒性铺平了道路。受最近大型生成模型成功的启发，我们认为解决泛化难题的关键之一在于三维服装数据的数量和质量。为此，GarVerseLOD 收集了由专业艺术家手工创建的 6000 个高质量布料模型，这些模型具有精细的几何细节。除了训练数据的规模外，我们还观察到，几何粒度的分离在提高学习模型的泛化能力和推理准确性方面发挥着重要作用。因此，我们将 GarVerseLOD 制作成一个具有细节级别（LOD）的分层数据集，从无细节的风格化形状到具有像素对齐细节的姿态混合服装。这样，我们就能将推理分解成更简单的任务，缩小搜索空间，从而使这个高度受限的问题变得简单易行。为了确保 GarVerseLOD 能够很好地应用于野生图像，我们提出了一种基于条件扩散模型的新型标注范式，为每个服装模型生成大量高逼真度的配对图像。我们在大量野生图像上评估了我们的方法。实验结果表明，GarVerseLOD 生成的独立服装质量明显优于之前的方法，同时对姿势、光照、遮挡和变形的巨大变化具有鲁棒性。代码和数据集见 garverselod.github.io。

{"title":"GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details","authors":"Zhongjin Luo, Haolin Liu, Chenghong Li, Wanghao Du, Zirong Jin, Wanhu Sun, Yinyu Nie, Weikai Chen, Xiaoguang Han","doi":"10.1145/3687921","DOIUrl":"https://doi.org/10.1145/3687921","url":null,"abstract":"Neural implicit functions have brought impressive advances to the state-of-the-art of clothed human digitization from multiple or even single images. However, despite the progress, current arts still have difficulty generalizing to unseen images with complex cloth deformation and body poses. In this work, we present GarVerseLOD, a new dataset and framework that paves the way to achieving unprecedented robustness in high-fidelity 3D garment reconstruction from a single unconstrained image. Inspired by the recent success of large generative models, we believe that one key to addressing the generalization challenge lies in the quantity and quality of 3D garment data. Towards this end, GarVerseLOD collects 6,000 high-quality cloth models with fine-grained geometry details manually created by professional artists. In addition to the scale of training data, we observe that having disentangled granularities of geometry can play an important role in boosting the generalization capability and inference accuracy of the learned model. We hence craft GarVerseLOD as a hierarchical dataset with levels of details (LOD) , spanning from detail-free stylized shape to pose-blended garment with pixel-aligned details. This allows us to make this highly under-constrained problem tractable by factorizing the inference into easier tasks, each narrowed down with smaller searching space. To ensure GarVerseLOD can generalize well to in-the-wild images, we propose a novel labeling paradigm based on conditional diffusion models to generate extensive paired images for each garment model with high photorealism. We evaluate our method on a massive amount of in-the-wild images. Experimental results demonstrate that GarVerseLOD can generate standalone garment pieces with significantly better quality than prior approaches while being robust against a large variation of pose, illumination, occlusion, and deformation. Code and dataset are available at garverselod.github.io.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"80 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing the Aesthetics of 3D Shapes via Reference-based Editing 通过参考编辑增强三维形状的美感

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687954

Minchan Chen, Manfred Lau

While there have been previous works that explored methods to enhance the aesthetics of images, the automated beautification of 3D shapes has been limited to specific shapes such as 3D face models. In this paper, we introduce a framework to automatically enhance the aesthetics of general 3D shapes. Our approach employs a reference-based beautification strategy. We first performed data collection to gather the aesthetics ratings of various 3D shapes to create a 3D shape aesthetics dataset. Then we perform reference-based editing to edit the input shape and beautify it by making it look more like some reference shape that is aesthetic. Specifically, we propose a reference-guided global deformation framework to coherently deform the input shape such that its structural proportions will be closer to those of the reference shape. We then optionally transplant some local aesthetic parts from the reference to the input to obtain the beautified output shapes. Comparisons show that our reference-guided 3D deformation algorithm outperforms existing techniques. Furthermore, quantitative and qualitative evaluations demonstrate that the performance of our aesthetics enhancement framework is consistent with both human perception and existing 3D shape aesthetics assessment.

虽然以前也有作品探索过增强图像美感的方法，但三维形状的自动美化仅限于特定形状，如三维人脸模型。在本文中，我们介绍了一种自动增强一般三维形状美感的框架。我们的方法采用了基于参考的美化策略。我们首先进行数据收集，收集各种三维形状的美学评分，创建三维形状美学数据集。然后，我们执行基于参考的编辑，对输入的形状进行编辑和美化，使其看起来更像某个具有美感的参考形状。具体来说，我们提出了一个以参考为导向的全局变形框架，对输入形状进行连贯变形，使其结构比例更接近参考形状。然后，我们会选择性地将一些局部美学部分从参考形状移植到输入形状中，从而获得美化后的输出形状。比较结果表明，我们的参考指导三维变形算法优于现有技术。此外，定量和定性评估表明，我们的美学增强框架的性能与人类感知和现有的三维形状美学评估一致。

{"title":"Enhancing the Aesthetics of 3D Shapes via Reference-based Editing","authors":"Minchan Chen, Manfred Lau","doi":"10.1145/3687954","DOIUrl":"https://doi.org/10.1145/3687954","url":null,"abstract":"While there have been previous works that explored methods to enhance the aesthetics of images, the automated beautification of 3D shapes has been limited to specific shapes such as 3D face models. In this paper, we introduce a framework to automatically enhance the aesthetics of general 3D shapes. Our approach employs a reference-based beautification strategy. We first performed data collection to gather the aesthetics ratings of various 3D shapes to create a 3D shape aesthetics dataset. Then we perform reference-based editing to edit the input shape and beautify it by making it look more like some reference shape that is aesthetic. Specifically, we propose a reference-guided global deformation framework to coherently deform the input shape such that its structural proportions will be closer to those of the reference shape. We then optionally transplant some local aesthetic parts from the reference to the input to obtain the beautified output shapes. Comparisons show that our reference-guided 3D deformation algorithm outperforms existing techniques. Furthermore, quantitative and qualitative evaluations demonstrate that the performance of our aesthetics enhancement framework is consistent with both human perception and existing 3D shape aesthetics assessment.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"176 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentiable Owen Scrambling 可微分欧文扰频

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687764

Bastien Doignies, David Coeurjolly, Nicolas Bonneel, Julie Digne, Jean-Claude Iehl, Victor Ostromoukhov

Quasi-Monte Carlo integration is at the core of rendering. This technique estimates the value of an integral by evaluating the integrand at well-chosen sample locations. These sample points are designed to cover the domain as uniformly as possible to achieve better convergence rates than purely random points. Deterministic low-discrepancy sequences have been shown to outperform many competitors by guaranteeing good uniformity as measured by the so-called discrepancy metric, and, indirectly, by an integer t value relating the number of points falling into each domain stratum with the stratum area (lower t is better). To achieve randomness, scrambling techniques produce multiple realizations preserving the t value, making the construction stochastic. Among them, Owen scrambling is a popular approach that recursively permutes intervals for each dimension. However, relying on permutation trees makes it incompatible with smooth optimization frameworks. We present a differentiable Owen scrambling that regularizes permutations. We show that it can effectively be used with automatic differentiation tools for optimizing low-discrepancy sequences to improve metrics such as optimal transport uniformity, integration error, designed power spectra or projective properties, while maintaining their initial t -value as guaranteed by Owen scrambling. In some rendering settings, we show that our optimized sequences improve the rendering error.

准蒙特卡罗积分是渲染的核心。这种技术通过在精心选择的样本位置对积分进行求值来估算积分值。这些采样点的设计目的是尽可能均匀地覆盖整个域，以达到比纯随机点更好的收敛速度。确定性低差异序列已被证明优于许多竞争者，它通过所谓的差异度量保证良好的均匀性，并间接地通过一个整数 t 值（t 值越小越好）来衡量落入每个域分层的点数与分层面积的关系。为了实现随机性，扰频技术会产生多个保留 t 值的实现值，从而使构造具有随机性。其中，欧文扰频是一种流行的方法，它对每个维度的区间进行递归置换。然而，依赖于置换树使其与平滑优化框架不兼容。我们提出了一种正则化排列的可微分欧文扰乱法。我们证明，它可以有效地与自动微分工具一起用于优化低差异序列，以改善最优传输均匀性、积分误差、设计功率谱或投影特性等指标，同时保持欧文扰频所保证的初始 t 值。在某些渲染设置中，我们的优化序列改善了渲染误差。

{"title":"Differentiable Owen Scrambling","authors":"Bastien Doignies, David Coeurjolly, Nicolas Bonneel, Julie Digne, Jean-Claude Iehl, Victor Ostromoukhov","doi":"10.1145/3687764","DOIUrl":"https://doi.org/10.1145/3687764","url":null,"abstract":"Quasi-Monte Carlo integration is at the core of rendering. This technique estimates the value of an integral by evaluating the integrand at well-chosen sample locations. These sample points are designed to cover the domain as uniformly as possible to achieve better convergence rates than purely random points. Deterministic low-discrepancy sequences have been shown to outperform many competitors by guaranteeing good uniformity as measured by the so-called discrepancy metric, and, indirectly, by an integer t value relating the number of points falling into each domain stratum with the stratum area (lower t is better). To achieve randomness, scrambling techniques produce multiple realizations preserving the t value, making the construction stochastic. Among them, Owen scrambling is a popular approach that recursively permutes intervals for each dimension. However, relying on permutation trees makes it incompatible with smooth optimization frameworks. We present a differentiable Owen scrambling that regularizes permutations. We show that it can effectively be used with automatic differentiation tools for optimizing low-discrepancy sequences to improve metrics such as optimal transport uniformity, integration error, designed power spectra or projective properties, while maintaining their initial t -value as guaranteed by Owen scrambling. In some rendering settings, we show that our optimized sequences improve the rendering error.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"22 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Skeleton-Driven Inbetweening of Bitmap Character Drawings 位图字符绘图的骨架驱动夹层技术

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687955

Kirill Brodt, Mikhail Bessmeltsev

One of the primary reasons for the high cost of traditional animation is the inbetweening process, where artists manually draw each intermediate frame necessary for smooth motion. Making this process more efficient has been at the core of computer graphics research for years, yet the industry has adopted very few solutions. Most existing solutions either require vector input or resort to tight inbetweening; often, they attempt to fully automate the process. In industry, however, keyframes are often spaced far apart, drawn in raster format, and contain occlusions. Moreover, inbetweening is fundamentally an artistic process, so the artist should maintain high-level control over it. We address these issues by proposing a novel inbetweening system for bitmap character drawings, supporting both tight and far inbetweening. In our setup, the artist can control motion by animating a skeleton between the keyframe poses. Our system then performs skeleton-based deformation of the bitmap drawings into the same pose and employs discrete optimization and deep learning to blend the deformed images. Besides the skeleton and the two drawn bitmap keyframes, we require very little annotation. However, deforming drawings with occlusions is complex, as it requires a piecewise smooth deformation field. To address this, we observe that this deformation field is smooth when the drawing is lifted into 3D. Our system therefore optimizes topology of a 2.5D partially layered template that we use to lift the drawing into 3D and get the final piecewise-smooth deformaton, effectively resolving occlusions. We validate our system through a series of animations, qualitative and quantitative comparisons, and user studies, demonstrating that our approach consistently outperforms the state of the art and our results are consistent with the viewers' perception. Code and data for our paper are available at http://www-labs.iro.umontreal.ca/~bmpix/inbetweening/.

传统动画成本高昂的主要原因之一是中间处理过程，即艺术家手动绘制平滑运动所需的每个中间帧。多年来，提高这一过程的效率一直是计算机图形学研究的核心，但业界采用的解决方案却寥寥无几。大多数现有的解决方案要么需要矢量输入，要么需要紧密的中间帧；通常，它们都试图将这一过程完全自动化。然而，在工业中，关键帧的间距往往很远，以光栅格式绘制，并包含遮挡物。此外，inbetweening 从根本上说是一个艺术过程，因此艺术家应该对其保持高层次的控制。为了解决这些问题，我们为位图字符绘制提出了一种新颖的夹入系统，同时支持紧密夹入和远距离夹入。在我们的设置中，艺术家可以通过关键帧姿势之间的骨架动画来控制运动。然后，我们的系统将基于骨架的位图绘图变形为相同的姿势，并采用离散优化和深度学习来混合变形图像。除了骨架和两个绘制的位图关键帧外，我们只需要很少的注释。不过，带有遮挡物的绘图变形非常复杂，因为它需要一个片状平滑变形场。为了解决这个问题，我们观察到当绘图被提升到三维时，这种变形场是平滑的。因此，我们的系统优化了 2.5D 部分分层模板的拓扑结构，我们使用该模板将图纸提升到三维，并获得最终的片状平滑变形，从而有效解决遮挡问题。我们通过一系列动画、定性和定量比较以及用户研究验证了我们的系统，证明我们的方法始终优于目前的技术水平，而且我们的结果与观众的感知一致。本文的代码和数据可在 http://www-labs.iro.umontreal.ca/~bmpix/inbetweening/ 上获取。

{"title":"Skeleton-Driven Inbetweening of Bitmap Character Drawings","authors":"Kirill Brodt, Mikhail Bessmeltsev","doi":"10.1145/3687955","DOIUrl":"https://doi.org/10.1145/3687955","url":null,"abstract":"One of the primary reasons for the high cost of traditional animation is the inbetweening process, where artists manually draw each intermediate frame necessary for smooth motion. Making this process more efficient has been at the core of computer graphics research for years, yet the industry has adopted very few solutions. Most existing solutions either require vector input or resort to tight inbetweening; often, they attempt to fully automate the process. In industry, however, keyframes are often spaced far apart, drawn in raster format, and contain occlusions. Moreover, inbetweening is fundamentally an artistic process, so the artist should maintain high-level control over it. We address these issues by proposing a novel inbetweening system for bitmap character drawings, supporting both tight and far inbetweening. In our setup, the artist can control motion by animating a skeleton between the keyframe poses. Our system then performs skeleton-based deformation of the bitmap drawings into the same pose and employs discrete optimization and deep learning to blend the deformed images. Besides the skeleton and the two drawn bitmap keyframes, we require very little annotation. However, deforming drawings with occlusions is complex, as it requires a piecewise smooth deformation field. To address this, we observe that this deformation field is smooth when the drawing is lifted into 3D. Our system therefore optimizes topology of a 2.5D partially layered template that we use to lift the drawing into 3D and get the final piecewise-smooth deformaton, effectively resolving occlusions. We validate our system through a series of animations, qualitative and quantitative comparisons, and user studies, demonstrating that our approach consistently outperforms the state of the art and our results are consistent with the viewers' perception. Code and data for our paper are available at http://www-labs.iro.umontreal.ca/~bmpix/inbetweening/.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"69 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stochastic Normal Orientation for Point Clouds 点云的随机正态定向

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687944

Guojin Huang, Qing Fang, Zheng Zhang, Ligang Liu, Xiao-Ming Fu

We propose a simple yet effective method to orient normals for point clouds. Central to our approach is a novel optimization objective function defined from global and local perspectives. Globally, we introduce a signed uncertainty function that distinguishes the inside and outside of the underlying surface. Moreover, benefiting from the statistics of our global term, we present a local orientation term instead of a global one. The optimization problem can be solved by the commonly used numerical optimization solver, such as L-BFGS. The capability and feasibility of our approach are demonstrated over various complex point clouds. We achieve higher practical robustness and normal quality than the state-of-the-art methods.

我们提出了一种简单而有效的点云法线定向方法。我们方法的核心是一个从全局和局部角度定义的新型优化目标函数。在全局上，我们引入了一个有符号的不确定性函数，用于区分底层表面的内部和外部。此外，受益于全局项的统计数据，我们提出了一个局部定向项，而不是全局项。优化问题可以通过常用的数值优化求解器（如 L-BFGS）来解决。我们的方法在各种复杂点云上的能力和可行性得到了验证。与最先进的方法相比，我们获得了更高的实际鲁棒性和正常质量。

引用次数: 0

A Mesh-based Simulation Framework using Automatic Code Generation 使用自动代码生成的基于网格的仿真框架

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687986

Philipp Herholz, Tuur Stuyck, Ladislav Kavan

Optimized parallel implementations on GPU or CPU have dramatically enhanced the fidelity, resolution and accuracy of physical simulations and mesh-based algorithms. However, attaining optimal performance requires expert knowledge and might demand complex code and memory layout optimizations. This adds to the fact that physical simulation algorithms require the implementation of derivatives, which can be a tedious and error-prone process. In recent years, researchers and practitioners have investigated the concept of designing systems that allow for a more expressive definition of mesh-based simulation code. These systems leverage domain-specific languages (DSL), automatic differentiation or symbolic computing to enhance readability of implementations without compromising performance. We follow this line of work and propose a symbolic code generation approach tailored to mesh-based computations on parallel devices. Our system extends related work by incorporating collision handling and a data access synchronization approach, enabling rapid sparse matrix assembly.

GPU 或 CPU 上经过优化的并行实施大大提高了物理模拟和基于网格算法的保真度、分辨率和精确度。然而，要达到最佳性能需要专业知识，还可能需要对代码和内存布局进行复杂的优化。此外，物理模拟算法还需要执行导数，这可能是一个繁琐且容易出错的过程。近年来，研究人员和从业人员对设计系统的概念进行了研究，这些系统允许对基于网格的仿真代码进行更具表现力的定义。这些系统利用特定领域语言 (DSL)、自动微分或符号计算来提高实现的可读性，同时又不影响性能。我们遵循这一工作路线，提出了一种为并行设备上基于网格的计算量身定制的符号代码生成方法。我们的系统扩展了相关工作，纳入了碰撞处理和数据访问同步方法，实现了稀疏矩阵的快速组装。

引用次数: 0

Quad mesh mechanisms 四网格机制

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687939

Caigui Jiang, Dmitry Lyakhov, Florian Rist, Helmut Pottmann, Johannes Wallner

This paper provides computational tools for the modeling and design of quad mesh mechanisms, which are meshes allowing continuous flexions under the assumption of rigid faces and hinges in the edges. We combine methods and results from different areas, namely differential geometry of surfaces, rigidity and flexibility of bar and joint frameworks, algebraic geometry, and optimization. The basic idea to achieve a time-continuous flexion is time-discretization justified by an algebraic degree argument. We are able to prove computationally feasible bounds on the number of required time instances we need to incorporate in our optimization. For optimization to succeed, an informed initialization is crucial. We present two computational pipelines to achieve that: one based on remeshing isometric surface pairs, another one based on iterative refinement. A third manner of initialization proved very effective: We interactively design meshes which are close to a narrow known class of flexible meshes, but not contained in it. Having enjoyed sufficiently many degrees of freedom during design, we afterwards optimize towards flexibility.

本文为四网格机构的建模和设计提供了计算工具，四网格机构是指在刚性面和边缘铰链假设下允许连续挠曲的网格。我们结合了不同领域的方法和结果，即曲面微分几何、杆件和关节框架的刚度和柔度、代数几何和优化。实现时间连续弯曲的基本思想是通过代数阶数论证时间离散化。我们能够证明优化所需的时间实例数量在计算上是可行的。要想优化成功，明智的初始化至关重要。我们提出了两种实现这一目标的计算管道：一种基于等距曲面对的重网格化，另一种基于迭代细化。事实证明，第三种初始化方式非常有效：我们以交互方式设计网格，这些网格接近于已知的狭义柔性网格类别，但并不包含在其中。在设计过程中获得足够多的自由度后，我们再对其灵活性进行优化。

引用次数: 0

StyleTex: Style Image-Guided Texture Generation for 3D Models StyleTex：三维模型的样式图像引导纹理生成

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687931

Zhiyu Xie, Yuqing Zhang, Xiangjun Tang, Yiqian Wu, Dehan Chen, Gongsheng Li, Xiaogang Jin

Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. We utilize Interval Score Matching to address over-smoothness and over-saturation, in combination with a geometry-aware ControlNet that ensures consistent geometry throughout the generative process. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.

风格引导纹理生成的目的是，在给定参考风格图像和带有文字说明的三维网格的情况下，生成与参考图像的风格和输入网格的几何形状相协调的纹理。尽管基于扩散的三维纹理生成方法（如蒸馏采样）在风格化游戏和电影中有着大量的应用前景，但它需要解决两个难题：1) 将三维模型的风格和内容与参考图像完全分离；2) 使生成的纹理与参考图像的色调、风格和给定的文字提示相一致。为此，我们推出了基于扩散模型的创新框架 StyleTex，用于为 3D 模型创建风格化纹理。我们的主要见解是在基于扩散的蒸馏采样中，将风格信息与参考图像分离，同时忽略内容。具体来说，在给定参考图像的情况下，我们首先从图像 CLIP 嵌入中分解其风格特征，方法是减去嵌入在内容特征方向上的正交投影，内容特征由文本 CLIP 嵌入表示。我们采用新颖的方法来分离参考图像的风格和内容信息，从而生成不同的风格和内容特征。然后，我们将风格特征注入交叉关注机制，将其纳入生成过程，同时利用内容特征作为负面提示，进一步分离内容信息。最后，我们将这些策略整合到 StyleTex 中，从而获得风格化纹理。我们利用区间分数匹配来解决过度平滑和过度饱和的问题，并结合几何感知控制网来确保整个生成过程中几何形状的一致性。StyleTex 生成的纹理既保留了参考图像的风格，又与文字提示和给定 3D 网格的内在细节保持一致。定量和定性实验表明，我们的方法明显优于现有的基线方法。

{"title":"StyleTex: Style Image-Guided Texture Generation for 3D Models","authors":"Zhiyu Xie, Yuqing Zhang, Xiangjun Tang, Yiqian Wu, Dehan Chen, Gongsheng Li, Xiaogang Jin","doi":"10.1145/3687931","DOIUrl":"https://doi.org/10.1145/3687931","url":null,"abstract":"Style-guided texture generation aims to generate a texture that is harmonious with both the style of the reference image and the geometry of the input mesh, given a reference style image and a 3D mesh with its text description. Although diffusion-based 3D texture generation methods, such as distillation sampling, have numerous promising applications in stylized games and films, it requires addressing two challenges: 1) decouple style and content completely from the reference image for 3D models, and 2) align the generated texture with the color tone, style of the reference image, and the given text prompt. To this end, we introduce StyleTex, an innovative diffusion-model-based framework for creating stylized textures for 3D models. Our key insight is to decouple style information from the reference image while disregarding content in diffusion-based distillation sampling. Specifically, given a reference image, we first decompose its style feature from the image CLIP embedding by subtracting the embedding's orthogonal projection in the direction of the content feature, which is represented by a text CLIP embedding. Our novel approach to disentangling the reference image's style and content information allows us to generate distinct style and content features. We then inject the style feature into the cross-attention mechanism to incorporate it into the generation process, while utilizing the content feature as a negative prompt to further dissociate content information. Finally, we incorporate these strategies into StyleTex to obtain stylized textures. We utilize Interval Score Matching to address over-smoothness and over-saturation, in combination with a geometry-aware ControlNet that ensures consistent geometry throughout the generative process. The resulting textures generated by StyleTex retain the style of the reference image, while also aligning with the text prompts and intrinsic details of the given 3D mesh. Quantitative and qualitative experiments show that our method outperforms existing baseline methods by a significant margin.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"99 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting 高斯对象利用高斯拼接技术从四个视角重建高质量三维物体

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687759

Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian

Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods.

从高度稀疏的视图中重建和渲染三维物体对于促进三维视觉技术的应用和改善用户体验至关重要。然而，来自稀疏视图的图像只包含非常有限的三维信息，这导致了两个重大挑战：1) 由于用于匹配的图像太少，难以建立多视图一致性；2) 由于视图覆盖范围不足，部分遗漏或高度压缩了物体信息。为了应对这些挑战，我们提出了高斯对象（GaussianObject），这是一种用高斯拼接来表示和渲染三维物体的框架，只需 4 幅输入图像就能达到很高的渲染质量。我们首先引入了视觉船体和漂浮物消除技术，将结构先验明确注入初始优化过程，以帮助建立多视图一致性，从而获得粗略的三维高斯表示。然后，我们构建一个基于扩散模型的高斯修复模型，以补充遗漏的对象信息，并在此基础上进一步完善高斯。我们设计了一种自生成策略，以获得用于训练修复模型的图像对。我们还设计了一种无 COLMAP 的变体，在这种变体中，不需要预先给定精确的相机姿势，从而获得了具有竞争力的质量，并促进了更广泛的应用。我们在几个具有挑战性的数据集上对 GaussianObject 进行了评估，其中包括 MipNeRF360、OmniObject3D、OpenIllumination 和我们收集的未摆放图像，结果显示，仅从四个视角就能获得卓越的性能，明显优于以前的 SOTA 方法。

{"title":"GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting","authors":"Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian","doi":"10.1145/3687759","DOIUrl":"https://doi.org/10.1145/3687759","url":null,"abstract":"Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"33 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MVImgNet2.0: A Larger-scale Dataset of Multi-view Images MVImgNet2.0：更大规模的多视角图像数据集

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687973

Yushuang Wu, Luyue Shi, Haolin Liu, Hongjie Liao, Lingteng Qiu, Weihao Yuan, Xiaodong Gu, Zilong Dong, Shuguang Cui, Xiaoguang Han

MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting, making a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360° views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) higher-quality dense point clouds are reconstructed via advanced methods for objects captured in 360 ° views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public at luyues.github.io/mvimgnet2 , including multi-view images of all 520k objects, the reconstructed high-quality point clouds, and data annotation codes, hoping to inspire the broader vision community.

MVImgNet 是一个大型数据集，包含 238 类约 22 万个真实世界物体的多视角图像。作为 ImageNet 的对应数据集，它通过多视角拍摄引入了三维视觉信号，在二维和三维视觉之间架起了一座软桥梁。本文构建的 MVImgNet2.0 数据集将 MVImgNet 扩展为总共约 52 万个对象和 515 个类别，从而衍生出一个规模更大的三维数据集，与二维领域的数据集更具可比性。除了扩大数据集的规模和类别范围外，MVImgNet2.0 还具有四个新特征，因此比 MVImgNet 质量更高：(i)大多数拍摄都能捕捉到物体的 360° 视图，这可以支持完整的物体重构学习；(ii)先进的分割方式可以生成精度更高的前景物体遮罩；(iii)采用了功能更强大的结构-运动方法，以较低的估计误差推导出每帧的摄像机姿态；(iv)通过先进的方法为 360° 视图中捕捉到的物体重构出更高质量的密集点云，可用于下游应用。广泛的实验证实了建议的 MVImgNet2.0 在提高大型三维重建模型性能方面的价值。MVImgNet2.0 将在 luyues.github.io/mvimgnet2 上公开，包括所有 520k 物体的多视角图像、重建的高质量点云和数据注释代码，希望能给更广泛的视觉社区带来启发。

{"title":"MVImgNet2.0: A Larger-scale Dataset of Multi-view Images","authors":"Yushuang Wu, Luyue Shi, Haolin Liu, Hongjie Liao, Lingteng Qiu, Weihao Yuan, Xiaodong Gu, Zilong Dong, Shuguang Cui, Xiaoguang Han","doi":"10.1145/3687973","DOIUrl":"https://doi.org/10.1145/3687973","url":null,"abstract":"MVImgNet is a large-scale dataset that contains multi-view images of ~220k real-world objects in 238 classes. As a counterpart of ImageNet, it introduces 3D visual signals via multi-view shooting, making a soft bridge between 2D and 3D vision. This paper constructs the MVImgNet2.0 dataset that expands MVImgNet into a total of ~520k objects and 515 categories, which derives a 3D dataset with a larger scale that is more comparable to ones in the 2D domain. In addition to the expanded dataset scale and category range, MVImgNet2.0 is of a higher quality than MVImgNet owing to four new features: (i) most shoots capture 360° views of the objects, which can support the learning of object reconstruction with completeness; (ii) the segmentation manner is advanced to produce foreground object masks of higher accuracy; (iii) a more powerful structure-from-motion method is adopted to derive the camera pose for each frame of a lower estimation error; (iv) higher-quality dense point clouds are reconstructed via advanced methods for objects captured in 360 ° views, which can serve for downstream applications. Extensive experiments confirm the value of the proposed MVImgNet2.0 in boosting the performance of large 3D reconstruction models. MVImgNet2.0 will be public at luyues.github.io/mvimgnet2 , including multi-view images of all 520k objects, the reconstructed high-quality point clouds, and data annotation codes, hoping to inspire the broader vision community.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"80 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0