ACM Transactions on Graphics最新文献

英文中文

Plug-and-Play Algorithms for Dynamic Non-line-of-sight Imaging 非视距动态成像的即插即用算法

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-05-14 DOI: 10.1145/3665139

Juntian Ye, Yu Hong, Xiongfei Su, Xin Yuan, Feihu Xu

Non-line-of-sight (NLOS) imaging has the ability to recover 3D images of scenes outside the direct line of sight, which is of growing interest for diverse applications. Despite the remarkable progress, NLOS imaging of dynamic objects is still challenging. It requires a large amount of multibounce photons for the reconstruction of single frame data. To overcome this obstacle, we develop a computational framework for dynamic time-of-flight NLOS imaging based on plug-and-play (PnP) algorithms. By combining imaging forward model with the deep denoising network from the computer vision community, we show a 4 frames-per-second (fps) 3D NLOS video recovery (128 × 128 × 512) in post processing. Our method leverages the temporal similarity among adjacent frames and incorporates sparse priors and frequency filtering. This enables higher-quality reconstructions for complex scenes. Extensive experiments are conducted to verify the superior performance of our proposed algorithm both through simulations and real data.

非视线（NLOS）成像能够恢复直接视线以外场景的三维图像，在各种应用中越来越受到关注。尽管取得了令人瞩目的进展，但动态物体的非视线成像仍具有挑战性。它需要大量的多弹光子来重建单帧数据。为了克服这一障碍，我们开发了一种基于即插即用（PnP）算法的动态飞行时间 NLOS 成像计算框架。通过将成像前向模型与计算机视觉领域的深度去噪网络相结合，我们在后期处理中展示了每秒 4 帧（fps）的 3D NLOS 视频恢复（128 × 128 × 512）。我们的方法利用了相邻帧之间的时间相似性，并结合了稀疏先验和频率滤波。这使得复杂场景的重建质量更高。我们进行了广泛的实验，通过模拟和真实数据验证了我们提出的算法的卓越性能。

引用次数: 0

View-Independent Adjoint Light Tracing for Lighting Design Optimization 用于优化照明设计的与视图无关的邻接光追踪技术

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-05-03 DOI: 10.1145/3662180

Lukas Lipp, David Hahn, Pierre Ecormier-Nocca, Florian Rist, Michael Wimmer

Differentiable rendering methods promise the ability to optimize various parameters of 3d scenes to achieve a desired result. However, lighting design has so far received little attention in this field. In this paper, we introduce a method that enables continuous optimization of the arrangement of luminaires in a 3d scene via differentiable light tracing. Our experiments show two major issues when attempting to apply existing methods from differentiable path tracing to this problem: first, many rendering methods produce images, which restricts the ability of a designer to define lighting objectives to image space. Second, most previous methods are designed for scene geometry or material optimization and have not been extensively tested for the case of optimizing light sources. Currently available differentiable ray-tracing methods do not provide satisfactory performance, even on fairly basic test cases in our experience. In this paper, we propose a novel adjoint light tracing method that overcomes these challenges and enables gradient-based lighting design optimization in a view-independent (camera-free) way. Thus, we allow the user to paint illumination targets directly onto the 3d scene or use existing baked illumination data (e.g., light maps). Using modern ray-tracing hardware, we achieve interactive performance. We find light tracing advantageous over path tracing in this setting, as it naturally handles irregular geometry, resulting in less noise and improved optimization convergence. We compare our adjoint gradients to state-of-the-art image-based differentiable rendering methods. We also demonstrate that our gradient data works with various common optimization algorithms, providing good convergence behaviour. Qualitative comparisons with real-world scenes underline the practical applicability of our method.

可变渲染方法能够优化三维场景的各种参数，从而达到理想的效果。然而，到目前为止，照明设计在这一领域还很少受到关注。在本文中，我们介绍了一种方法，通过可微分光线追踪技术对三维场景中的灯具布置进行持续优化。我们的实验表明，在尝试将现有的可微分路径追踪方法应用于这一问题时，存在两个主要问题：首先，许多渲染方法都会生成图像，这就限制了设计师将照明目标定义为图像空间的能力。其次，之前的大多数方法都是针对场景几何或材质优化设计的，并没有针对光源优化案例进行过广泛测试。根据我们的经验，目前可用的可微分光线追踪方法即使在相当基本的测试案例中也无法提供令人满意的性能。在本文中，我们提出了一种新颖的辅助光线追踪方法，该方法克服了这些难题，并能以一种与视图无关（无摄像头）的方式实现基于梯度的照明设计优化。因此，我们允许用户直接在三维场景上绘制照明目标，或使用现有的烘焙照明数据（如光照地图）。利用现代光线追踪硬件，我们实现了交互式性能。我们发现，在这种情况下，光线追踪比路径追踪更有优势，因为光线追踪可以自然地处理不规则几何体，从而减少噪音，提高优化收敛性。我们将邻接梯度与最先进的基于图像的可微分渲染方法进行了比较。我们还证明，我们的梯度数据可与各种常见的优化算法配合使用，具有良好的收敛性。与现实世界场景的定性比较强调了我们方法的实用性。

{"title":"View-Independent Adjoint Light Tracing for Lighting Design Optimization","authors":"Lukas Lipp, David Hahn, Pierre Ecormier-Nocca, Florian Rist, Michael Wimmer","doi":"10.1145/3662180","DOIUrl":"https://doi.org/10.1145/3662180","url":null,"abstract":"Differentiable rendering methods promise the ability to optimize various parameters of 3d scenes to achieve a desired result. However, lighting design has so far received little attention in this field. In this paper, we introduce a method that enables continuous optimization of the arrangement of luminaires in a 3d scene via differentiable light tracing. Our experiments show two major issues when attempting to apply existing methods from differentiable path tracing to this problem: first, many rendering methods produce images, which restricts the ability of a designer to define lighting objectives to image space. Second, most previous methods are designed for scene geometry or material optimization and have not been extensively tested for the case of optimizing light sources. Currently available differentiable ray-tracing methods do not provide satisfactory performance, even on fairly basic test cases in our experience. In this paper, we propose a novel adjoint light tracing method that overcomes these challenges and enables gradient-based lighting design optimization in a view-independent (camera-free) way. Thus, we allow the user to paint illumination targets directly onto the 3d scene or use existing baked illumination data (e.g., light maps). Using modern ray-tracing hardware, we achieve interactive performance. We find light tracing advantageous over path tracing in this setting, as it naturally handles irregular geometry, resulting in less noise and improved optimization convergence. We compare our adjoint gradients to state-of-the-art image-based differentiable rendering methods. We also demonstrate that our gradient data works with various common optimization algorithms, providing good convergence behaviour. Qualitative comparisons with real-world scenes underline the practical applicability of our method.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"7 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

I❤MESH: A DSL for Mesh Processing I❤MESH：网格处理 DSL

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-05-01 DOI: 10.1145/3662181

Yong Li, Shoaib Kamil, Keenan Crane, Alec Jacobson, Yotam Gingold

Mesh processing algorithms are often communicated via concise mathematical notation (e.g., summation over mesh neighborhoods). However, conversion of notation into working code remains a time consuming and error-prone process which requires arcane knowledge of low-level data structures and libraries—impeding rapid exploration of high-level algorithms. We address this problem by introducing a domain-specific language (DSL) for mesh processing called I❤MESH, which resembles notation commonly used in visual and geometric computing, and automates the process of converting notation into code. The centerpiece of our language is a flexible notation for specifying and manipulating neighborhoods of a cell complex, internally represented via standard operations on sparse boundary matrices. This layered design enables natural expression of algorithms while minimizing demands on a code generation back-end. In particular, by integrating I❤MESH with the linear algebra features of the I❤LA DSL, and adding support for automatic differentiation, we can rapidly implement a rich variety of algorithms on point clouds, surface meshes, and volume meshes.

网格处理算法通常通过简洁的数学符号（如网格邻域求和）进行交流。然而，将数学符号转换为工作代码仍然是一个耗时且容易出错的过程，需要掌握底层数据结构和库的神秘知识，从而阻碍了对高层算法的快速探索。为了解决这个问题，我们为网格处理引入了一种名为 I❤MESH 的特定领域语言 (DSL)，这种语言类似于视觉和几何计算中常用的符号，可以自动完成将符号转换为代码的过程。我们语言的核心是一种灵活的符号，用于指定和操作单元复合物的邻域，内部通过对稀疏边界矩阵的标准操作来表示。这种分层设计既能自然地表达算法，又能最大限度地降低对代码生成后端的要求。特别是，通过将 I❤MESH 与 I❤LA DSL 的线性代数功能集成，并添加对自动微分的支持，我们可以在点云、曲面网格和体积网格上快速实现丰富多样的算法。

引用次数: 0

Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022 在大规模公开挑战赛中评估手势生成：2022 年 GENEA 挑战赛

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-27 DOI: 10.1145/3656374

Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter

This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field.

The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around (-0.5). Based on the challenge results we formulate numerous recommendations for system building and evaluation.

本文报告了第二届 GENEA 挑战赛的情况，该挑战赛旨在对数据驱动的自动协同语音手势生成进行基准测试。参赛团队使用相同的语音和动作数据集构建手势生成系统。所有这些系统生成的动作都使用标准化的可视化管道渲染成视频，并在几个大型的众包用户研究中进行评估。与比较不同的研究论文不同，这里的结果差异只是由于方法的不同，因此可以直接比较不同的系统。数据集基于 18 个小时的全身动作捕捉，包括手指，捕捉的对象是正在进行二人对话的不同人。十支团队参加了两个级别的挑战赛：全身和上半身手势。对于每个级别，我们既要评估手势动作与人类的相似性，又要评估其是否适合特定的语音信号。我们的评估将与人类的相似性和手势的适当性分离开来，这一直是该领域的一个难题。评估结果表明，某些合成手势比三维人体动作捕捉更像人。据我们所知，这种情况以前从未出现过。另一方面，我们发现所有的合成动作都远不如原始动作捕捉记录更适合语音。我们还发现，在这次大规模的评估中，传统的客观指标与主观的人类相似度评级并没有很好的相关性。唯一的例外是弗雷谢特手势距离（FGD），它的 Kendall's tau 等级相关性约为(-0.5)。基于挑战结果，我们为系统建设和评估提出了许多建议。

{"title":"Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022","authors":"Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter","doi":"10.1145/3656374","DOIUrl":"https://doi.org/10.1145/3656374","url":null,"abstract":"This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around (-0.5). Based on the challenge results we formulate numerous recommendations for system building and evaluation.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"9 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differentiable solver for time-dependent deformation problems with contact 随时间变化的接触变形问题的可微分求解器

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-26 DOI: 10.1145/3657648

Zizhou Huang, Davi Colli Tozoni, Arvi Gjoka, Zachary Ferguson, Teseo Schneider, Daniele Panozzo, Denis Zorin

We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code.

We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations.

我们为具有接触和摩擦力的随时间变化的变形问题引入了一种通用可微分求解器。我们的方法使用带有高阶时间积分器的有限元离散化，并结合最近提出的增量势接触法来处理接触力和摩擦力，从而解决复杂几何场景中的 ODE 和 PDE 受限优化问题。它支持静态和动态问题，并对物理问题描述中涉及的所有物理参数（包括形状、材料参数、摩擦参数和初始条件）进行微分。我们通过分析推导出的积分公式效率很高，与正向模拟相比开销很小（对于非线性问题通常小于 10%），并且与正向问题有很多相似之处，因此可以重复使用现有正向模拟器的大部分代码。我们在开源 PolyFEM 库的基础上实现了我们的方法，并在模拟结果和物理验证中演示了我们的求解器在形状设计、初始条件优化和材料估算方面的适用性。

{"title":"Differentiable solver for time-dependent deformation problems with contact","authors":"Zizhou Huang, Davi Colli Tozoni, Arvi Gjoka, Zachary Ferguson, Teseo Schneider, Daniele Panozzo, Denis Zorin","doi":"10.1145/3657648","DOIUrl":"https://doi.org/10.1145/3657648","url":null,"abstract":"We introduce a general differentiable solver for time-dependent deformation problems with contact and friction. Our approach uses a finite element discretization with a high-order time integrator coupled with the recently proposed incremental potential contact method for handling contact and friction forces to solve ODE- and PDE-constrained optimization problems on scenes with complex geometry. It supports static and dynamic problems and differentiation with respect to all physical parameters involved in the physical problem description, which include shape, material parameters, friction parameters, and initial conditions. Our analytically derived adjoint formulation is efficient, with a small overhead (typically less than 10% for nonlinear problems) over the forward simulation, and shares many similarities with the forward problem, allowing the reuse of large parts of existing forward simulator code. We implement our approach on top of the open-source PolyFEM library and demonstrate the applicability of our solver to shape design, initial condition optimization, and material estimation on both simulated results and physical validations.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"8 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140651314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-Time Neural Appearance Models 实时神经外观模型

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-20 DOI: 10.1145/3659577

Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn

We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations.

Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation.

By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.

我们介绍了一套完整的系统，用于实时渲染以前只能离线使用的具有复杂外观的场景。这是通过算法和系统级创新的结合实现的。我们的外观模型利用学习到的分层纹理，通过神经解码器进行解释，产生反射值和重要度采样方向。为了更好地利用解码器的建模能力，我们为解码器配备了两个图形先验。第一个先验--将方向转换为学习到的阴影帧--有助于准确重建中尺度效应。第二个先验--微面采样分布--允许神经解码器高效执行重要性采样。由此产生的外观模型支持各向异性采样和细节层次渲染，并能将深层次的材料图烘焙成紧凑统一的神经表示。通过将硬件加速的张量运算暴露给光线追踪着色器，我们展示了在实时路径追踪器中高效内联和执行神经解码器的可能性。我们分析了神经材料数量增加时的可扩展性，并建议使用针对一致性和发散性执行进行优化的代码来提高性能。我们的神经材料着色器比非神经分层材料快一个数量级以上。这为在游戏和实时预览等实时应用中使用电影级视觉效果打开了大门。

{"title":"Real-Time Neural Appearance Models","authors":"Tizian Zeltner, Fabrice Rousselle, Andrea Weidlich, Petrik Clarberg, Jan Novák, Benedikt Bitterli, Alex Evans, Tomáš Davidovič, Simon Kallweit, Aaron Lefohn","doi":"10.1145/3659577","DOIUrl":"https://doi.org/10.1145/3659577","url":null,"abstract":"We present a complete system for real-time rendering of scenes with complex appearance previously reserved for offline use. This is achieved with a combination of algorithmic and system level innovations. Our appearance model utilizes learned hierarchical textures that are interpreted using neural decoders, which produce reflectance values and importance-sampled directions. To best utilize the modeling capacity of the decoders, we equip the decoders with two graphics priors. The first prior—transformation of directions into learned shading frames—facilitates accurate reconstruction of mesoscale effects. The second prior—a microfacet sampling distribution—allows the neural decoder to perform importance sampling efficiently. The resulting appearance model supports anisotropic sampling and level-of-detail rendering, and allows baking deeply layered material graphs into a compact unified neural representation. By exposing hardware accelerated tensor operations to ray tracing shaders, we show that it is possible to inline and execute the neural decoders efficiently inside a real-time path tracer. We analyze scalability with increasing number of neural materials and propose to improve performance using code optimized for coherent and divergent execution. Our neural material shaders can be over an order of magnitude faster than non-neural layered materials. This opens up the door for using film-quality visuals in real-time applications such as games and live previews.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"16 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints ConceptLab：使用 VLM 引导的扩散先验约束生成创意概念

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-04-16 DOI: 10.1145/3659578

Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or

Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative text-to-image generation, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of “prior constraints”. To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.

最近的文本到图像生成模型使我们能够将文字转化为生动迷人的图像。随之而来的个性化技术也让我们能够在新的场景中想象独特的概念。然而，一个耐人寻味的问题依然存在：我们如何才能生成一个从未见过的新的想象概念？在本文中，我们提出了从文本到图像的创造性生成任务，在此任务中，我们试图生成一个大类中的新成员（例如，生成一个不同于所有现有宠物的宠物）。我们利用研究不足的扩散先验模型，证明创意生成问题可以表述为扩散先验输出空间的优化过程，从而产生一组 "先验约束"。为了使我们生成的概念不趋同于现有成员，我们加入了一个能回答问题的视觉语言模型（VLM），它能自适应地为优化问题添加新的约束条件，从而鼓励模型发现越来越多的独特创意。最后，我们展示了我们的先验约束也可以作为一种强大的混合机制，让我们能够在生成的概念之间创建混合体，从而为创造过程引入更大的灵活性。

{"title":"ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints","authors":"Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or","doi":"10.1145/3659578","DOIUrl":"https://doi.org/10.1145/3659578","url":null,"abstract":"Recent text-to-image generative models have enabled us to transform our words into vibrant, captivating imagery. The surge of personalization techniques that has followed has also allowed us to imagine unique concepts in new scenes. However, an intriguing question remains: How can we generate a new, imaginary concept that has never been seen before? In this paper, we present the task of creative text-to-image generation, where we seek to generate new members of a broad category (e.g., generating a pet that differs from all existing pets). We leverage the under-studied Diffusion Prior models and show that the creative generation problem can be formulated as an optimization process over the output space of the diffusion prior, resulting in a set of “prior constraints”. To keep our generated concept from converging into existing members, we incorporate a question-answering Vision-Language Model (VLM) that adaptively adds new constraints to the optimization problem, encouraging the model to discover increasingly more unique creations. Finally, we show that our prior constraints can also serve as a strong mixing mechanism allowing us to create hybrids between generated concepts, introducing even more flexibility into the creative process.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"25 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140557143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DMHomo: Learning Homography with Diffusion Models DMHomo：利用扩散模型学习同构模型

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-03-11 DOI: 10.1145/3652207

Haipeng Li, Hai Jiang, Ao Luo, Ping Tan, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

Supervised homography estimation methods face a challenge due to the lack of adequate labeled training data. To address this issue, we propose DMHomo, a diffusion model-based framework for supervised homography learning. This framework generates image pairs with accurate labels, realistic image content, and realistic interval motion, ensuring they satisfy adequate pairs. We utilize unlabeled image pairs with pseudo-labels such as homography and dominant plane masks, computed from existing methods, to train a diffusion model that generates a supervised training dataset. To further enhance performance, we introduce a new probabilistic mask loss, which identifies outlier regions through supervised training, and an iterative mechanism to optimize the generative and homography models successively. Our experimental results demonstrate that DMHomo effectively overcomes the scarcity of qualified datasets in supervised homography learning and improves generalization to real-world scenes. The code and dataset are available at: https://github.com/lhaippp/DMHomo

由于缺乏足够的标记训练数据，有监督的同源性估计方法面临着挑战。为了解决这个问题，我们提出了基于扩散模型的监督同源性学习框架 DMHomo。该框架生成的图像对具有准确的标签、逼真的图像内容和逼真的间隔运动，确保它们满足充分的图像对要求。我们利用从现有方法中计算出的带有伪标签（如同构图和优势平面掩码）的无标签图像对来训练扩散模型，从而生成一个有监督的训练数据集。为了进一步提高性能，我们引入了一种新的概率掩码损失（通过监督训练识别离群区域）和一种迭代机制，以连续优化生成模型和同构模型。实验结果表明，DMHomo 有效克服了监督同构学习中合格数据集稀缺的问题，并提高了对真实场景的泛化能力。代码和数据集可在以下网址获取： https://github.com/lhaippp/DMHomo

引用次数: 0

Joint Stroke Tracing and Correspondence for 2D Animation 二维动画的联合描边与对应

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-29 DOI: 10.1145/3649890

Haoran Mo, Chengying Gao, Ruomei Wang

To alleviate human labor in redrawing keyframes with ordered vector strokes for automatic inbetweening, we for the first time propose a joint stroke tracing and correspondence approach. Given consecutive raster keyframes along with a single vector image of the starting frame as a guidance, the approach generates vector drawings for the remaining keyframes while ensuring one-to-one stroke correspondence. Our framework trained on clean line drawings generalizes to rough sketches and the generated results can be imported into inbetweening systems to produce inbetween sequences. Hence, the method is compatible with standard 2D animation workflow. An adaptive spatial transformation module (ASTM) is introduced to handle non-rigid motions and stroke distortion. We collect a dataset for training, with 10k+ pairs of raster frames and their vector drawings with stroke correspondence. Comprehensive validations on real clean and rough animated frames manifest the effectiveness of our method and superiority to existing methods.

为了减轻用有序的矢量笔画重新绘制关键帧以实现自动夹帧的人力劳动，我们首次提出了一种联合笔画追踪和对应方法。该方法以连续的光栅关键帧和起始帧的单个矢量图像为指导，为剩余的关键帧生成矢量图，同时确保一一对应的笔画。我们在简洁线条图上训练的框架可通用于粗略草图，生成的结果可导入中间系统以生成中间序列。因此，该方法与标准的二维动画工作流程兼容。我们引入了自适应空间转换模块（ASTM）来处理非刚性运动和笔触变形。我们收集了一个用于训练的数据集，其中包含 10k+ 对光栅帧及其矢量图与笔画的对应关系。在真实的干净和粗糙的动画帧上进行的全面验证证明了我们方法的有效性以及优于现有方法的优势。

引用次数: 0

A Dual-Particle Approach for Incompressible SPH Fluids 不可压缩 SPH 流体的双粒子方法

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-02-29 DOI: 10.1145/3649888

Shusen Liu, Xiaowei He, Yuzhong Guo, Yue Chang, Wencheng Wang

Tensile instability is one of the major obstacles to particle methods in fluid simulation, which would cause particles to clump in pairs under tension and prevent fluid simulation to generate small-scale thin features. To address this issue, previous particle methods either use a background pressure or a finite difference scheme to alleviate the particle clustering artifacts, yet still fail to produce small-scale thin features in free-surface flows. In this paper, we propose a dual-particle approach for simulating incompressible fluids. Our approach involves incorporating supplementary virtual particles designed to capture and store particle pressures. These pressure samples undergo systematic redistribution at each time step, grounded in the initial positions of the fluid particles. By doing so, we effectively reduce tensile instability in standard SPH by narrowing down the unstable regions for particles experiencing tensile stress. As a result, we can accurately simulate free-surface flows with rich small-scale thin features, such as droplets, streamlines, and sheets, as demonstrated by experimental results.

拉伸不稳定性是粒子方法在流体模拟中的主要障碍之一，它会导致粒子在拉伸作用下成对聚集，使流体模拟无法产生小尺度薄特征。为了解决这个问题，以往的粒子方法要么使用背景压力，要么使用有限差分方案来缓解粒子成团的假象，但仍然无法在自由表面流中生成小尺度薄特征。在本文中，我们提出了一种模拟不可压缩流体的双粒子方法。我们的方法包括加入旨在捕捉和存储颗粒压力的辅助虚拟颗粒。这些压力样本在每个时间步进行系统的重新分配，并以流体粒子的初始位置为基础。通过这种方法，我们有效地减少了标准 SPH 中的拉伸不稳定性，缩小了经历拉伸应力的粒子的不稳定区域。因此，我们可以精确地模拟自由表面流体，其具有丰富的小尺度薄特征，如液滴、流线和薄片，实验结果也证明了这一点。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Transactions on Graphics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀