ACM Transactions on Graphics最新文献_第4页

NFPLight: Deep SVBRDF Estimation via the Combination of Near and Far Field Point Lighting NFPLight：通过近场和远场点照明相结合进行深度 SVBRDF 估算

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687978

Li Wang, Lianghao Zhang, Fangzhou Gao, Yuzhen Kang, Jiawan Zhang

Recovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from a few hand-held captured images has been a challenging task in computer graphics. Benefiting from the learned priors from data, single-image methods can obtain plausible SVBRDF estimation results. However, the extremely limited appearance information in a single image does not suffice for high-quality SVBRDF reconstruction. Although increasing the number of inputs can improve the reconstruction quality, it also affects the efficiency of real data capture and adds significant computational burdens. Therefore, the key challenge is to minimize the required number of inputs, while keeping high-quality results. To address this, we propose maximizing the effective information in each input through a novel co-located capture strategy that combines near-field and far-field point lighting. To further enhance effectiveness, we theoretically investigate the inherent relation between two images. The extracted relation is strongly correlated with the slope of specular reflectance, substantially enhancing the precision of roughness map estimation. Additionally, we designed the registration and denoising modules to meet the practical requirements of hand-held capture. Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches. All source codes will be publicly released.

从几张手持拍摄的图像中恢复空间变化双向反射分布函数（SVBRDF）一直是计算机制图领域的一项挑战性任务。利用从数据中学习到的先验，单图像方法可以获得可信的 SVBRDF 估计结果。然而，单幅图像中极其有限的外观信息不足以重建高质量的 SVBRDF。虽然增加输入的数量可以提高重建质量，但同时也会影响实际数据采集的效率，并增加大量的计算负担。因此，如何在保证高质量结果的前提下最大限度地减少所需的输入数量是一个关键挑战。为了解决这个问题，我们提出了一种结合近场和远场点照明的新型共定位捕捉策略，从而最大化每个输入中的有效信息。为了进一步提高效果，我们从理论上研究了两幅图像之间的内在关系。提取的关系与镜面反射率的斜率密切相关，大大提高了粗糙度图估算的精度。此外，我们还设计了配准和去噪模块，以满足手持采集的实际要求。定量评估和定性分析表明，与之前的方法相比，我们的方法实现了更出色的 SVBRDF 估计。所有源代码都将公开发布。

{"title":"NFPLight: Deep SVBRDF Estimation via the Combination of Near and Far Field Point Lighting","authors":"Li Wang, Lianghao Zhang, Fangzhou Gao, Yuzhen Kang, Jiawan Zhang","doi":"10.1145/3687978","DOIUrl":"https://doi.org/10.1145/3687978","url":null,"abstract":"Recovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from a few hand-held captured images has been a challenging task in computer graphics. Benefiting from the learned priors from data, single-image methods can obtain plausible SVBRDF estimation results. However, the extremely limited appearance information in a single image does not suffice for high-quality SVBRDF reconstruction. Although increasing the number of inputs can improve the reconstruction quality, it also affects the efficiency of real data capture and adds significant computational burdens. Therefore, the key challenge is to minimize the required number of inputs, while keeping high-quality results. To address this, we propose maximizing the effective information in each input through a novel co-located capture strategy that combines near-field and far-field point lighting. To further enhance effectiveness, we theoretically investigate the inherent relation between two images. The extracted relation is strongly correlated with the slope of specular reflectance, substantially enhancing the precision of roughness map estimation. Additionally, we designed the registration and denoising modules to meet the practical requirements of hand-held capture. Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches. All source codes will be publicly released.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

UFO Instruction Graphs Are Machine Knittable 可用机器编织的 UFO 指示图

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687948

Jenny Han Lin, Yuka Ikarashi, Gilbert Louis Bernstein, James McCann

Programming low-level controls for knitting machines is a meticulous, time-consuming task that demands specialized expertise. Recently, there has been a shift towards automatically generating low-level knitting machine programs from high-level knit representations that describe knit objects in a more intuitive, user-friendly way. Current high-level systems trade off expressivity for ease-of-use, requiring ad-hoc trapdoors to access the full space of machine capabilities, or eschewing completeness in the name of utility. Thus, advanced techniques either require ad-hoc extensions from domain experts, or are entirely unsupported. Furthermore, errors may emerge during the compilation from knit object representations to machine instructions. While the generated program may describe a valid machine control sequence, the fabricated object is topologically different from the specified input, with little recourse for understanding and fixing the issue. To address these limitations, we introduce instruction graphs , an intermediate representation capable of capturing the full range of machine knitting programs. We define a semantic mapping from instruction graphs to fenced tangles, which make them compatible with the established formal semantics for machine knitting instructions. We establish a semantics-preserving bijection between machine knittable instruction graphs and knit programs that proves three properties - upward, forward, and ordered (UFO) - are both necessary and sufficient to ensure the existence of a machine knitting program that can fabricate the fenced tangle denoted by the graph. As a proof-of-concept, we implement an instruction graph editor and compiler that allows a user to transform an instruction graph into UFO presentation and then compile it to a machine program, all while maintaining semantic equivalence. In addition, we use the UFO properties to more precisely characterize the limitations of existing compilers. This work lays the groundwork for more expressive and reliable automated knitting machine programming systems by providing a formal characterization of machine knittability.

编织机底层控制程序是一项细致、耗时的工作，需要专业的技术。最近，人们开始转向从高级针织表征自动生成低级针织机程序，这种表征能以更直观、更方便用户的方式描述针织对象。目前的高级系统在易用性与表现力之间进行了权衡，需要通过临时的陷阱门来获取机器的全部功能，或者以实用性为名放弃完整性。因此，高级技术要么需要领域专家的临时扩展，要么完全得不到支持。此外，从编织对象表示法到机器指令的编译过程中可能会出现错误。虽然生成的程序可能描述了一个有效的机器控制序列，但编织出来的对象在拓扑结构上与指定的输入不同，几乎没有办法理解和解决这个问题。为了解决这些局限性，我们引入了指令图，这是一种能够捕捉全部机器针织程序的中间表示法。我们定义了从指令图到栅栏缠结的语义映射，使其与已建立的机器针织指令形式语义兼容。我们在机器可编织指令图和编织程序之间建立了一个语义保留双投射，证明了三个属性--向上、向前和有序（UFO）--对于确保存在一个能编织图所表示的栅栏纠结的机器编织程序是必要且充分的。作为概念验证，我们实现了一种指令图编辑器和编译器，允许用户将指令图转换为 UFO 表示法，然后编译为机器程序，同时保持语义等同。此外，我们还利用 UFO 特性更精确地描述了现有编译器的局限性。这项工作通过提供机器可编织性的形式化特征，为编织机自动编程系统奠定了更具表现力和可靠性的基础。

{"title":"UFO Instruction Graphs Are Machine Knittable","authors":"Jenny Han Lin, Yuka Ikarashi, Gilbert Louis Bernstein, James McCann","doi":"10.1145/3687948","DOIUrl":"https://doi.org/10.1145/3687948","url":null,"abstract":"Programming low-level controls for knitting machines is a meticulous, time-consuming task that demands specialized expertise. Recently, there has been a shift towards automatically generating low-level knitting machine programs from high-level knit representations that describe knit objects in a more intuitive, user-friendly way. Current high-level systems trade off expressivity for ease-of-use, requiring ad-hoc trapdoors to access the full space of machine capabilities, or eschewing completeness in the name of utility. Thus, advanced techniques either require ad-hoc extensions from domain experts, or are entirely unsupported. Furthermore, errors may emerge during the compilation from knit object representations to machine instructions. While the generated program may describe a valid machine control sequence, the fabricated object is topologically different from the specified input, with little recourse for understanding and fixing the issue. To address these limitations, we introduce instruction graphs , an intermediate representation capable of capturing the full range of machine knitting programs. We define a semantic mapping from instruction graphs to fenced tangles, which make them compatible with the established formal semantics for machine knitting instructions. We establish a semantics-preserving bijection between machine knittable instruction graphs and knit programs that proves three properties - upward, forward, and ordered (UFO) - are both necessary and sufficient to ensure the existence of a machine knitting program that can fabricate the fenced tangle denoted by the graph. As a proof-of-concept, we implement an instruction graph editor and compiler that allows a user to transform an instruction graph into UFO presentation and then compile it to a machine program, all while maintaining semantic equivalence. In addition, we use the UFO properties to more precisely characterize the limitations of existing compilers. This work lays the groundwork for more expressive and reliable automated knitting machine programming systems by providing a formal characterization of machine knittability.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"10 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Visual Perception of Object Motion in Dynamic Environments 评估动态环境中物体运动的视觉感知

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687912

Budmonde Duinkharjav, Jenna Kang, Gavin Stuart Peter Miller, Chang Xiao, Qi Sun

Precisely understanding how objects move in 3D is essential for broad scenarios such as video editing, gaming, driving, and athletics. With screen-displayed computer graphics content, users only perceive limited cues to judge the object motion from the on-screen optical flow. Conventionally, visual perception is studied with stationary settings and singular objects. However, in practical applications, we---the observer---also move within complex scenes. Therefore, we must extract object motion from a combined optical flow displayed on screen, which can often lead to mis-estimations due to perceptual ambiguities. We measure and model observers' perceptual accuracy of object motions in dynamic 3D environments, a universal but under-investigated scenario in computer graphics applications. We design and employ a crowdsourcing-based psychophysical study, quantifying the relationships among patterns of scene dynamics and content, and the resulting perceptual judgments of object motion direction. The acquired psychophysical data underpins a model for generalized conditions. We then demonstrate the model's guidance ability to significantly enhance users' understanding of task object motion in gaming and animation design. With applications in measuring and compensating for object motion errors in video and rendering, we hope the research establishes a new frontier for understanding and mitigating perceptual errors caused by the gap between screen-displayed graphics and the physical world.

准确理解物体在三维空间中的运动方式对于视频编辑、游戏、驾驶和运动等广泛的应用场景至关重要。对于屏幕显示的计算机图形内容，用户只能从屏幕光流中感知有限的线索来判断物体的运动。传统上，视觉感知的研究对象是静止的环境和单一的物体。然而，在实际应用中，我们--观察者--也会在复杂的场景中移动。因此，我们必须从屏幕上显示的组合光流中提取物体的运动，而这往往会因知觉模糊而导致错误估计。我们对观察者在动态三维环境中对物体运动的感知准确性进行了测量和建模，这是计算机图形应用中一个普遍但研究不足的场景。我们设计并采用了基于众包的心理物理研究，量化了场景动态和内容模式之间的关系，以及由此产生的对物体运动方向的感知判断。获得的心理物理数据为通用条件模型提供了基础。随后，我们展示了该模型的指导能力，可显著增强用户对游戏和动画设计中任务对象运动的理解。通过测量和补偿视频和渲染中的物体运动误差，我们希望这项研究能为理解和减轻屏幕显示图形与物理世界之间的差距所造成的感知误差开辟一个新的领域。

{"title":"Evaluating Visual Perception of Object Motion in Dynamic Environments","authors":"Budmonde Duinkharjav, Jenna Kang, Gavin Stuart Peter Miller, Chang Xiao, Qi Sun","doi":"10.1145/3687912","DOIUrl":"https://doi.org/10.1145/3687912","url":null,"abstract":"Precisely understanding how objects move in 3D is essential for broad scenarios such as video editing, gaming, driving, and athletics. With screen-displayed computer graphics content, users only perceive limited cues to judge the object motion from the on-screen optical flow. Conventionally, visual perception is studied with stationary settings and singular objects. However, in practical applications, we---the observer---also move within complex scenes. Therefore, we must extract object motion from a combined optical flow displayed on screen, which can often lead to mis-estimations due to perceptual ambiguities. We measure and model observers' perceptual accuracy of object motions in dynamic 3D environments, a universal but under-investigated scenario in computer graphics applications. We design and employ a crowdsourcing-based psychophysical study, quantifying the relationships among patterns of scene dynamics and content, and the resulting perceptual judgments of object motion direction. The acquired psychophysical data underpins a model for generalized conditions. We then demonstrate the model's guidance ability to significantly enhance users' understanding of task object motion in gaming and animation design. With applications in measuring and compensating for object motion errors in video and rendering, we hope the research establishes a new frontier for understanding and mitigating perceptual errors caused by the gap between screen-displayed graphics and the physical world.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"46 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

StyleCrafter: Taming Artistic Video Diffusion with Reference-Augmented Adapter Learning StyleCrafter：通过参考增强适配器学习控制艺术视频扩散

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687975

Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Yibo Wang, Xintao Wang, Ying Shan, Yujiu Yang

Text-to-video (T2V) models have shown remarkable capabilities in generating diverse videos. However, they struggle to produce user-desired artistic videos due to (i) text's inherent clumsiness in expressing specific styles and (ii) the generally degraded style fidelity. To address these challenges, we introduce StyleCrafter, a generic method that enhances pretrained T2V models with a style control adapter, allowing video generation in any style by feeding a reference image. Considering the scarcity of artistic video data, we propose to first train a style control adapter using style-rich image datasets, then transfer the learned stylization ability to video generation through a tailor-made finetuning paradigm. To promote content-style disentanglement, we employ carefully designed data augmentation strategies to enhance decoupled learning. Additionally, we propose a scale-adaptive fusion module to balance the influences of text-based content features and image-based style features, which helps generalization across various text and style combinations. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images. Experiments demonstrate that our approach is more flexible and efficient than existing competitors. Project page: https://gongyeliu.github.io/StyleCrafter.github.io/

文本到视频（T2V）模型在生成多样化视频方面表现出了卓越的能力。然而，由于(i) 文本在表达特定风格时固有的笨拙性和(ii) 普遍降低的风格保真度，它们很难生成用户所需的艺术视频。为了应对这些挑战，我们引入了 StyleCrafter，这是一种通用方法，它利用风格控制适配器来增强预训练的 T2V 模型，通过提供参考图像来生成任何风格的视频。考虑到艺术视频数据的稀缺性，我们建议首先使用风格丰富的图像数据集训练风格控制适配器，然后通过量身定制的微调范式将所学的风格化能力转移到视频生成中。为了促进内容与风格的分离，我们采用了精心设计的数据增强策略来加强解耦学习。此外，我们还提出了一个规模自适应融合模块，以平衡基于文本的内容特征和基于图像的风格特征的影响，这有助于在各种文本和风格组合中实现泛化。StyleCrafter 能高效生成高质量的风格化视频，这些视频既与文本内容一致，又与参考图像的风格相似。实验证明，我们的方法比现有的竞争对手更灵活、更高效。项目页面： https://gongyeliu.github.io/StyleCrafter.github.io/

{"title":"StyleCrafter: Taming Artistic Video Diffusion with Reference-Augmented Adapter Learning","authors":"Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Yibo Wang, Xintao Wang, Ying Shan, Yujiu Yang","doi":"10.1145/3687975","DOIUrl":"https://doi.org/10.1145/3687975","url":null,"abstract":"Text-to-video (T2V) models have shown remarkable capabilities in generating diverse videos. However, they struggle to produce user-desired artistic videos due to (i) text's inherent clumsiness in expressing specific styles and (ii) the generally degraded style fidelity. To address these challenges, we introduce StyleCrafter, a generic method that enhances pretrained T2V models with a style control adapter, allowing video generation in any style by feeding a reference image. Considering the scarcity of artistic video data, we propose to first train a style control adapter using style-rich image datasets, then transfer the learned stylization ability to video generation through a tailor-made finetuning paradigm. To promote content-style disentanglement, we employ carefully designed data augmentation strategies to enhance decoupled learning. Additionally, we propose a scale-adaptive fusion module to balance the influences of text-based content features and image-based style features, which helps generalization across various text and style combinations. StyleCrafter efficiently generates high-quality stylized videos that align with the content of the texts and resemble the style of the reference images. Experiments demonstrate that our approach is more flexible and efficient than existing competitors. Project page: https://gongyeliu.github.io/StyleCrafter.github.io/","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"176 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Still-Moving: Customized Video Generation without Customized Video Data 静止不动：无需定制视频数据即可生成定制视频

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687945

Hila Chefer, Shiran Zada, Roni Paiss, Ariel Ephrat, Omer Tov, Michael Rubinstein, Lior Wolf, Tali Dekel, Tomer Michaeli, Inbar Mosseri

Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a T2I model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on "frozen videos" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.

定制文本到图像（T2I）模型最近取得了巨大进展，尤其是在个性化、风格化和条件生成等领域。然而，将这一进展扩展到视频生成仍处于起步阶段，这主要是由于缺乏定制的视频数据。在这项工作中，我们引入了一个新颖的通用框架--Still-Moving，用于定制文本到视频（T2V）模型，而不需要任何定制的视频数据。该框架适用于著名的 T2V 设计，其中视频模型建立在 T2I 模型之上（例如，通过膨胀）。我们假定可以访问定制版的 T2I 模型，该模型只在静态图像数据上进行训练（例如，使用 DreamBooth）。天真地将定制 T2I 模型的权重插入 T2V 模型往往会导致明显的伪影或与定制数据的一致性不足。为了克服这一问题，我们训练了轻量级空间适配器，以调整注入的 T2I 层产生的特征。重要的是，我们的适配器是在 "冻结视频"（即重复图像）上进行训练的，而 "冻结视频 "是由定制的 T2I 模型生成的图像样本构建的。新颖的运动适配器模块为这种训练提供了便利，它允许我们在这种静态视频上进行训练，同时保留视频模型的运动先验。测试时，我们会移除运动适配器模块，只保留经过训练的空间适配器。这就恢复了 T2V 模型的运动先验，同时遵循了定制 T2I 模型的空间先验。我们在个性化、风格化和条件生成等不同任务中展示了我们方法的有效性。在所有评估场景中，我们的方法将定制 T2I 模型的空间先验与 T2V 模型提供的运动先验进行了无缝整合。

{"title":"Still-Moving: Customized Video Generation without Customized Video Data","authors":"Hila Chefer, Shiran Zada, Roni Paiss, Ariel Ephrat, Omer Tov, Michael Rubinstein, Lior Wolf, Tali Dekel, Tomer Michaeli, Inbar Mosseri","doi":"10.1145/3687945","DOIUrl":"https://doi.org/10.1145/3687945","url":null,"abstract":"Customizing text-to-image (T2I) models has seen tremendous progress recently, particularly in areas such as personalization, stylization, and conditional generation. However, expanding this progress to video generation is still in its infancy, primarily due to the lack of customized video data. In this work, we introduce Still-Moving, a novel generic framework for customizing a text-to-video (T2V) model, without requiring any customized video data. The framework applies to the prominent T2V design where the video model is built over a T2I model (e.g., via inflation). We assume access to a customized version of the T2I model, trained only on still image data (e.g., using DreamBooth). Naively plugging in the weights of the customized T2I model into the T2V model often leads to significant artifacts or insufficient adherence to the customization data. To overcome this issue, we train lightweight Spatial Adapters that adjust the features produced by the injected T2I layers. Importantly, our adapters are trained on \"frozen videos\" (i.e., repeated images), constructed from image samples generated by the customized T2I model. This training is facilitated by a novel Motion Adapter module, which allows us to train on such static videos while preserving the motion prior of the video model. At test time, we remove the Motion Adapter modules and leave in only the trained Spatial Adapters. This restores the motion prior of the T2V model while adhering to the spatial prior of the customized T2I model. We demonstrate the effectiveness of our approach on diverse tasks including personalized, stylized, and conditional generation. In all evaluated scenarios, our method seamlessly integrates the spatial prior of the customized T2I model with a motion prior supplied by the T2V model.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"19 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fluid Implicit Particles on Coadjoint Orbits 共轭轨道上的流体隐含粒子

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687970

Mohammad Sina Nabizadeh, Ritoban Roy-Chowdhury, Hang Yin, Ravi Ramamoorthi, Albert Chern

We propose Coadjoint Orbit FLIP (CO-FLIP), a high order accurate, structure preserving fluid simulation method in the hybrid Eulerian-Lagrangian framework. We start with a Hamiltonian formulation of the incompressible Euler Equations, and then, using a local, explicit, and high order divergence free interpolation, construct a modified Hamiltonian system that governs our discrete Euler flow. The resulting discretization, when paired with a geometric time integration scheme, is energy and circulation preserving (formally the flow evolves on a coadjoint orbit) and is similar to the Fluid Implicit Particle (FLIP) method. CO-FLIP enjoys multiple additional properties including that the pressure projection is exact in the weak sense, and the particle-to-grid transfer is an exact inverse of the grid-to-particle interpolation. The method is demonstrated numerically with outstanding stability, energy, and Casimir preservation. We show that the method produces benchmarks and turbulent visual effects even at low grid resolutions.

我们提出了 Coadjoint Orbit FLIP (CO-FLIP)，这是一种在欧拉-拉格朗日混合框架下的高阶精确、结构保持流体模拟方法。我们从不可压缩欧拉方程的哈密顿公式入手，然后利用局部、显式和高阶无发散插值，构建了一个修正的哈密顿系统，用于控制离散欧拉流。由此产生的离散化与几何时间积分方案相配合，具有能量和环流保护性（形式上流动在共轭轨道上演化），类似于流体隐含粒子（FLIP）方法。CO-FLIP 还具有多种附加特性，包括压力投影在弱意义上是精确的，粒子到网格的转移是网格到粒子插值的精确倒数。该方法具有出色的稳定性、能量和卡西米尔保持性，并得到了数值验证。我们表明，即使在低网格分辨率下，该方法也能产生基准和湍流视觉效果。

引用次数: 0

PCO: Precision-Controllable Offset Surfaces with Sharp Features PCO：具有锐利特征的精密可控偏移表面

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687920

Lei Wang, Xudong Wang, Pengfei Wang, Shuangmin Chen, Shiqing Xin, Jiong Guo, Wenping Wang, Changhe Tu

Surface offsetting is a crucial operation in digital geometry processing and computer-aided design, where an offset is defined as an iso-value surface of the distance field. A challenge emerges as even smooth surfaces can exhibit sharp features in their offsets due to the non-differentiable characteristics of the underlying distance field. Prevailing approaches to the offsetting problem involve approximating the distance field and then extracting the iso-surface. However, even with dual contouring (DC), there is a risk of degrading sharp feature points/lines due to the inaccurate discretization of the distance field. This issue is exacerbated when the input is a piecewise-linear triangle mesh. This study is inspired by the observation that a triangle-based distance field, unlike the complex distance field rooted at the entire surface, remains smooth across the entire 3D space except at the triangle itself. With a polygonal surface comprising n triangles, the final distance field for accommodating the offset surface is determined by minimizing these n triangle-based distance fields. In implementation, our approach starts by tetrahedralizing the space around the offset surface, enabling a tetrahedron-wise linear approximation for each triangle-based distance field. The final offset surface within a tetrahedral range can be traced by slicing the tetrahedron with planes. As illustrated in the teaser figure, a key advantage of our algorithm is its ability to precisely preserve sharp features. Furthermore, this paper addresses the problem of simplifying the offset surface's complexity while preserving sharp features, formulating it as a maximal-clique problem.

曲面偏移是数字几何处理和计算机辅助设计中的一项重要操作，偏移被定义为距离场的等值曲面。由于底层距离场的不可分特性，即使是光滑的表面也会在偏移中表现出尖锐的特征，这就给我们带来了挑战。解决偏移问题的主流方法是近似距离场，然后提取等值面。然而，即使采用双等高线（DC），由于距离场的离散化不准确，也存在锐利特征点/线退化的风险。当输入是片状线性三角形网格时，这一问题会更加严重。与植根于整个表面的复杂距离场不同，基于三角形的距离场在整个三维空间中（除三角形本身外）保持平滑。对于由 n 个三角形组成的多边形表面，通过最小化这 n 个基于三角形的距离场，就能确定用于容纳偏移表面的最终距离场。在实施过程中，我们首先对偏移表面周围的空间进行四面体化，从而对每个三角形距离场进行四面体线性近似。通过对四面体进行平面切分，可以追踪到四面体范围内的最终偏移表面。如预告图所示，我们算法的一个关键优势是能够精确保留尖锐特征。此外，本文还解决了在保留尖锐特征的同时简化偏移曲面复杂性的问题，并将其表述为一个最大角问题。

{"title":"PCO: Precision-Controllable Offset Surfaces with Sharp Features","authors":"Lei Wang, Xudong Wang, Pengfei Wang, Shuangmin Chen, Shiqing Xin, Jiong Guo, Wenping Wang, Changhe Tu","doi":"10.1145/3687920","DOIUrl":"https://doi.org/10.1145/3687920","url":null,"abstract":"Surface offsetting is a crucial operation in digital geometry processing and computer-aided design, where an offset is defined as an iso-value surface of the distance field. A challenge emerges as even smooth surfaces can exhibit sharp features in their offsets due to the non-differentiable characteristics of the underlying distance field. Prevailing approaches to the offsetting problem involve approximating the distance field and then extracting the iso-surface. However, even with dual contouring (DC), there is a risk of degrading sharp feature points/lines due to the inaccurate discretization of the distance field. This issue is exacerbated when the input is a piecewise-linear triangle mesh. This study is inspired by the observation that a triangle-based distance field, unlike the complex distance field rooted at the entire surface, remains smooth across the entire 3D space except at the triangle itself. With a polygonal surface comprising n triangles, the final distance field for accommodating the offset surface is determined by minimizing these n triangle-based distance fields. In implementation, our approach starts by tetrahedralizing the space around the offset surface, enabling a tetrahedron-wise linear approximation for each triangle-based distance field. The final offset surface within a tetrahedral range can be traced by slicing the tetrahedron with planes. As illustrated in the teaser figure, a key advantage of our algorithm is its ability to precisely preserve sharp features. Furthermore, this paper addresses the problem of simplifying the offset surface's complexity while preserving sharp features, formulating it as a maximal-clique problem.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"1 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximation by Meshes with Spherical Faces 用球面网格进行逼近

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687942

Anthony Cisneros Ramos, Martin Kilian, Alisher Aikyn, Helmut Pottmann, Christian Müller

Meshes with spherical faces and circular edges are an attractive alternative to polyhedral meshes for applications in architecture and design. Approximation of a given surface by such a mesh needs to consider the visual appearance, approximation quality, the position and orientation of circular intersections of neighboring faces and the existence of a torsion free support structure that is formed by the planes of circular edges. The latter requirement implies that the mesh simultaneously defines a second mesh whose faces lie on the same spheres as the faces of the first mesh. It is a discretization of the two envelopes of a sphere congruence, i.e., a two-parameter family of spheres. We relate such sphere congruences to torsal parameterizations of associated line congruences. Turning practical requirements into properties of such a line congruence, we optimize line and sphere congruence as a basis for computing a mesh with spherical triangular or quadrilateral faces that approximates a given reference surface.

在建筑和设计领域的应用中，具有球面和圆形边缘的网格是多面体网格的一种有吸引力的替代品。用这种网格逼近给定表面时，需要考虑视觉外观、逼近质量、相邻面的圆形交点的位置和方向，以及是否存在由圆形边缘平面构成的无扭转支撑结构。后一项要求意味着网格同时定义了第二个网格，其面与第一个网格的面位于相同的球面上。它是球面全等的两个包络的离散化，即球面的双参数族。我们将这种球面全等与相关的线面全等的拓扑参数化联系起来。我们将实际需求转化为这种线全等的属性，优化了线全等和球全等，并以此为基础计算出了近似给定参考面的球面三角形或四边形网格。

引用次数: 0

V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians V^3：通过可流式二维动态高斯在手机上观看体积视频

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687935

Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, Lan Xu

Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V 3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V 3 , outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.

像 2D 视频一样无缝体验高保真体积视频是人们长久以来的梦想。然而，目前的动态 3DGS 方法尽管具有很高的渲染质量，但由于计算和带宽限制，在移动设备上进行流式传输时面临着挑战。在本文中，我们介绍了 V 3（Viewing Volumetric Videos），这是一种通过动态高斯流实现高质量移动渲染的新方法。我们的主要创新是将动态 3DGS 视作 2D 视频，从而便于使用硬件视频编解码器。此外，我们还提出了一种两阶段训练策略，以快速的训练速度降低存储需求。第一阶段采用哈希编码和浅层 MLP 学习运动，然后通过剪枝减少高斯的数量，以满足流媒体的要求；第二阶段使用残差熵损失和时间损失微调其他高斯属性，以提高时间连续性。这种策略将运动和外观分离开来，在满足紧凑存储要求的同时保持了较高的渲染质量。同时，我们设计了一个多平台播放器来解码和渲染二维高斯视频。广泛的实验证明了 V 3 的有效性，它通过在普通设备上实现高质量的渲染和流媒体播放，超越了其他方法，这在以前是从未有过的。作为首款在移动设备上流式传输动态高斯视频的产品，我们的配套播放器为用户提供了前所未有的体积视频体验，包括流畅滚动和即时分享。我们的项目页面和源代码可在 https://authoritywang.github.io/v3/ 上查阅。

{"title":"V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians","authors":"Penghao Wang, Zhirui Zhang, Liao Wang, Kaixin Yao, Siyuan Xie, Jingyi Yu, Minye Wu, Lan Xu","doi":"10.1145/3687935","DOIUrl":"https://doi.org/10.1145/3687935","url":null,"abstract":"Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V 3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V 3 , outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"14 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D Reconstruction with Fast Dipole Sums 利用快速偶极和进行三维重建

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687914

Hanyu Chen, Bailey Miller, Ioannis Gkioulekas

We introduce a method for high-quality 3D reconstruction from multi-view images. Our method uses a new point-based representation, the regularized dipole sum, which generalizes the winding number to allow for interpolation of per-point attributes in point clouds with noisy or outlier points. Using regularized dipole sums, we represent implicit geometry and radiance fields as per-point attributes of a dense point cloud, which we initialize from structure from motion. We additionally derive Barnes-Hut fast summation schemes for accelerated forward and adjoint dipole sum queries. These queries facilitate the use of ray tracing to efficiently and differentiably render images with our point-based representations, and thus update their point attributes to optimize scene geometry and appearance. We evaluate our method in inverse rendering applications against state-of-the-art alternatives, based on ray tracing of neural representations or rasterization of Gaussian point-based representations. Our method significantly improves 3D reconstruction quality and robustness at equal runtimes, while also supporting more general rendering methods such as shadow rays for direct illumination.

我们介绍了一种从多视角图像进行高质量三维重建的方法。我们的方法使用了一种新的基于点的表示方法--正则化偶极子和，它将缠绕数广义化，允许在有噪声点或离群点的点云中对每点属性进行插值。利用正则化偶极子和，我们将隐含几何和辐射场表示为密集点云的每点属性，并根据运动结构对其进行初始化。此外，我们还推导出了用于加速正向和邻接偶极和查询的巴恩斯-胡特快速求和方案。这些查询便于使用光线追踪来高效、可微分地渲染图像，从而更新点属性，优化场景几何和外观。我们在反渲染应用中评估了我们的方法，并与基于神经表示的光线追踪或基于高斯点表示的光栅化的最先进替代方法进行了比较。在相同的运行时间内，我们的方法大大提高了三维重建的质量和鲁棒性，同时还支持更通用的渲染方法，如直接照射的阴影射线。

引用次数: 0