arXiv - CS - Graphics最新文献_第4页

Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation 几何图像扩散：基于图像的表面表示：快速、数据高效的文本到三维技术

arXiv - CS - Graphics

Pub Date : 2024-09-05 DOI: arxiv-2409.03718

Slava Elizarov, Ciara Rowles, Simon Donné

Generating high-quality 3D objects from textual descriptions remains achallenging problem due to computational cost, the scarcity of 3D data, andcomplex 3D representations. We introduce Geometry Image Diffusion(GIMDiffusion), a novel Text-to-3D model that utilizes geometry images toefficiently represent 3D shapes using 2D images, thereby avoiding the need forcomplex 3D-aware architectures. By integrating a Collaborative Controlmechanism, we exploit the rich 2D priors of existing Text-to-Image models suchas Stable Diffusion. This enables strong generalization even with limited 3Dtraining data (allowing us to use only high-quality training data) as well asretaining compatibility with guidance techniques such as IPAdapter. In short,GIMDiffusion enables the generation of 3D assets at speeds comparable tocurrent Text-to-Image models. The generated objects consist of semanticallymeaningful, separate parts and include internal structures, enhancing bothusability and versatility.

由于计算成本、三维数据稀缺和复杂的三维表示，从文本描述生成高质量的三维对象仍然是一个具有挑战性的问题。我们介绍了几何图像扩散（GIMDiffusion），这是一种新颖的文本到三维模型，它利用几何图像来使用二维图像有效地表示三维形状，从而避免了对复杂的三维感知架构的需求。通过集成协作控制机制，我们利用了稳定扩散等现有文本到图像模型丰富的 2D 先验。这样，即使在三维训练数据有限的情况下（允许我们只使用高质量的训练数据），也能实现很强的泛化能力，同时还能保持与 IPAdapter 等引导技术的兼容性。简而言之，GIMDiffusion 能够以与当前文本到图像模型相当的速度生成三维资产。生成的对象由具有语义意义的独立部分组成，并包含内部结构，从而提高了可操作性和通用性。

引用次数: 0

Volumetric Surfaces: Representing Fuzzy Geometries with Multiple Meshes 体积曲面：用多个网格表示模糊几何图形

arXiv - CS - Graphics

Pub Date : 2024-09-04 DOI: arxiv-2409.02482

Stefano Esposito, Anpei Chen, Christian Reiser, Samuel Rota Bulò, Lorenzo Porzi, Katja Schwarz, Christian Richardt, Michael Zollhöfer, Peter Kontschieder, Andreas Geiger

High-quality real-time view synthesis methods are based on volume rendering,splatting, or surface rendering. While surface-based methods generally are thefastest, they cannot faithfully model fuzzy geometry like hair. In turn,alpha-blending techniques excel at representing fuzzy materials but require anunbounded number of samples per ray (P1). Further overheads are induced byempty space skipping in volume rendering (P2) and sorting input primitives insplatting (P3). These problems are exacerbated on low-performance graphicshardware, e.g. on mobile devices. We present a novel representation forreal-time view synthesis where the (P1) number of sampling locations is smalland bounded, (P2) sampling locations are efficiently found via rasterization,and (P3) rendering is sorting-free. We achieve this by representing objects assemi-transparent multi-layer meshes, rendered in fixed layer order fromoutermost to innermost. We model mesh layers as SDF shells with optimal spacinglearned during training. After baking, we fit UV textures to the correspondingmeshes. We show that our method can represent challenging fuzzy objects whileachieving higher frame rates than volume-based and splatting-based methods onlow-end and mobile devices.

高质量的实时视图合成方法基于体积渲染、溅射或表面渲染。虽然基于曲面的方法通常速度最快，但它们无法忠实地模拟头发等模糊几何体。反过来，阿尔法混合技术擅长表现模糊材料，但每条光线需要的采样数量不受限制（P1）。体积渲染中的跳空（P2）和平铺中的输入基元排序（P3）也会造成更多开销。这些问题在低性能图形硬件（如移动设备）上更加严重。我们提出了一种用于实时视图合成的新型表示法，在这种表示法中，(P1) 取样位置的数量很少且有界；(P2) 取样位置可通过光栅化高效找到；(P3) 渲染无需排序。为了实现这一目标，我们将物体表示为透明的多层网格，并按照从最外层到最内层的固定层序进行渲染。我们将网格层建模为 SDF 壳，并在训练过程中学习最佳间距。烘烤后，我们将 UV 纹理拟合到相应的网格上。结果表明，我们的方法可以表现具有挑战性的模糊物体，同时在低端设备和移动设备上比基于体积和溅射的方法获得更高的帧率。

{"title":"Volumetric Surfaces: Representing Fuzzy Geometries with Multiple Meshes","authors":"Stefano Esposito, Anpei Chen, Christian Reiser, Samuel Rota Bulò, Lorenzo Porzi, Katja Schwarz, Christian Richardt, Michael Zollhöfer, Peter Kontschieder, Andreas Geiger","doi":"arxiv-2409.02482","DOIUrl":"https://doi.org/arxiv-2409.02482","url":null,"abstract":"High-quality real-time view synthesis methods are based on volume rendering,\u0000splatting, or surface rendering. While surface-based methods generally are the\u0000fastest, they cannot faithfully model fuzzy geometry like hair. In turn,\u0000alpha-blending techniques excel at representing fuzzy materials but require an\u0000unbounded number of samples per ray (P1). Further overheads are induced by\u0000empty space skipping in volume rendering (P2) and sorting input primitives in\u0000splatting (P3). These problems are exacerbated on low-performance graphics\u0000hardware, e.g. on mobile devices. We present a novel representation for\u0000real-time view synthesis where the (P1) number of sampling locations is small\u0000and bounded, (P2) sampling locations are efficiently found via rasterization,\u0000and (P3) rendering is sorting-free. We achieve this by representing objects as\u0000semi-transparent multi-layer meshes, rendered in fixed layer order from\u0000outermost to innermost. We model mesh layers as SDF shells with optimal spacing\u0000learned during training. After baking, we fit UV textures to the corresponding\u0000meshes. We show that our method can represent challenging fuzzy objects while\u0000achieving higher frame rates than volume-based and splatting-based methods on\u0000low-end and mobile devices.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering 通过反渲染恢复航空摄影测量图像的一般反照率方法

arXiv - CS - Graphics

Pub Date : 2024-09-04 DOI: arxiv-2409.03032

Shuang Song, Rongjun Qin

Modeling outdoor scenes for the synthetic 3D environment requires therecovery of reflectance/albedo information from raw images, which is anill-posed problem due to the complicated unmodeled physics in this process(e.g., indirect lighting, volume scattering, specular reflection). The problemremains unsolved in a practical context. The recovered albedo can facilitatemodel relighting and shading, which can further enhance the realism of renderedmodels and the applications of digital twins. Typically, photogrammetric 3Dmodels simply take the source images as texture materials, which inherentlyembed unwanted lighting artifacts (at the time of capture) into the texture.Therefore, these polluted textures are suboptimal for a synthetic environmentto enable realistic rendering. In addition, these embedded environmentallightings further bring challenges to photo-consistencies across differentimages that cause image-matching uncertainties. This paper presents a generalimage formation model for albedo recovery from typical aerial photogrammetricimages under natural illuminations and derives the inverse model to resolve thealbedo information through inverse rendering intrinsic image decomposition. Ourapproach builds on the fact that both the sun illumination and scene geometryare estimable in aerial photogrammetry, thus they can provide direct inputs forthis ill-posed problem. This physics-based approach does not require additionalinput other than data acquired through the typical drone-based photogrammetriccollection and was shown to favorably outperform existing approaches. We alsodemonstrate that the recovered albedo image can in turn improve typical imageprocessing tasks in photogrammetry such as feature and dense matching, edge,and line extraction.

合成三维环境的室外场景建模需要从原始图像中获取反射率/反照率信息，由于这一过程中存在复杂的未建模物理现象（如间接照明、体散射、镜面反射），因此这仍然是一个难题。在实际应用中，这一问题仍未得到解决。恢复后的反照率可以促进模型的重新照明和着色，从而进一步增强渲染模型的真实感和数字双胞胎的应用。通常情况下，摄影测量三维模型只是将源图像作为纹理素材，这就在纹理中嵌入了（捕捉时）不需要的照明伪影。此外，这些内嵌的环境光照进一步带来了不同图像间光照一致性的挑战，从而导致图像匹配的不确定性。本文提出了一个通用图像形成模型，用于从自然光照下的典型航空摄影测量图像中恢复反照率，并推导出反模型，通过反渲染内在图像分解来解析反照率信息。我们的方法基于这样一个事实，即在航空摄影测量中，太阳光照和场景几何都是可以估算的，因此它们可以为这个难题提供直接输入。除了通过典型的无人机摄影测量采集获得的数据外，这种基于物理学的方法不需要额外的输入，而且性能优于现有方法。我们还证明，恢复的反照率图像可以反过来改进摄影测量中的典型图像处理任务，如特征和密集匹配、边缘和线条提取。

{"title":"A General Albedo Recovery Approach for Aerial Photogrammetric Images through Inverse Rendering","authors":"Shuang Song, Rongjun Qin","doi":"arxiv-2409.03032","DOIUrl":"https://doi.org/arxiv-2409.03032","url":null,"abstract":"Modeling outdoor scenes for the synthetic 3D environment requires the\u0000recovery of reflectance/albedo information from raw images, which is an\u0000ill-posed problem due to the complicated unmodeled physics in this process\u0000(e.g., indirect lighting, volume scattering, specular reflection). The problem\u0000remains unsolved in a practical context. The recovered albedo can facilitate\u0000model relighting and shading, which can further enhance the realism of rendered\u0000models and the applications of digital twins. Typically, photogrammetric 3D\u0000models simply take the source images as texture materials, which inherently\u0000embed unwanted lighting artifacts (at the time of capture) into the texture.\u0000Therefore, these polluted textures are suboptimal for a synthetic environment\u0000to enable realistic rendering. In addition, these embedded environmental\u0000lightings further bring challenges to photo-consistencies across different\u0000images that cause image-matching uncertainties. This paper presents a general\u0000image formation model for albedo recovery from typical aerial photogrammetric\u0000images under natural illuminations and derives the inverse model to resolve the\u0000albedo information through inverse rendering intrinsic image decomposition. Our\u0000approach builds on the fact that both the sun illumination and scene geometry\u0000are estimable in aerial photogrammetry, thus they can provide direct inputs for\u0000this ill-posed problem. This physics-based approach does not require additional\u0000input other than data acquired through the typical drone-based photogrammetric\u0000collection and was shown to favorably outperform existing approaches. We also\u0000demonstrate that the recovered albedo image can in turn improve typical image\u0000processing tasks in photogrammetry such as feature and dense matching, edge,\u0000and line extraction.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos DepthCrafter：为开放世界视频生成一致的长深度序列

arXiv - CS - Graphics

Pub Date : 2024-09-03 DOI: arxiv-2409.02095

Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan

Despite significant advancements in monocular depth estimation for staticimages, estimating video depth in the open world remains challenging, sinceopen-world videos are extremely diverse in content, motion, camera movement,and length. We present DepthCrafter, an innovative method for generatingtemporally consistent long depth sequences with intricate details foropen-world videos, without requiring any supplementary information such ascamera poses or optical flow. DepthCrafter achieves generalization ability toopen-world videos by training a video-to-depth model from a pre-trainedimage-to-video diffusion model, through our meticulously designed three-stagetraining strategy with the compiled paired video-depth datasets. Our trainingapproach enables the model to generate depth sequences with variable lengths atone time, up to 110 frames, and harvest both precise depth details and richcontent diversity from realistic and synthetic datasets. We also propose aninference strategy that processes extremely long videos through segment-wiseestimation and seamless stitching. Comprehensive evaluations on multipledatasets reveal that DepthCrafter achieves state-of-the-art performance inopen-world video depth estimation under zero-shot settings. Furthermore,DepthCrafter facilitates various downstream applications, including depth-basedvisual effects and conditional video generation.

尽管在静态图像的单目深度估算方面取得了重大进展，但在开放世界中估算视频深度仍然具有挑战性，因为开放世界视频在内容、运动、摄像机移动和长度方面都极其多样化。我们提出的 DepthCrafter 是一种创新方法，用于为开放世界视频生成具有复杂细节的时空一致的长深度序列，而不需要任何补充信息，如摄像机姿势或光流。DepthCrafter 通过精心设计的三阶段训练策略，利用编译的成对视频深度数据集，从预先训练的图像到视频扩散模型训练视频到深度模型，从而实现对开放世界视频的泛化能力。我们的训练方法使模型能够一次生成不同长度的深度序列，最长可达 110 帧，并从现实和合成数据集中获取精确的深度细节和丰富的内容多样性。我们还提出了一种推理策略，通过分段估计和无缝拼接来处理超长视频。在多个数据集上进行的综合评估表明，DepthCrafter 在零镜头设置下的开放世界视频深度估计方面达到了最先进的性能。此外，DepthCrafter 还为各种下游应用提供了便利，包括基于深度的视觉效果和条件视频生成。

{"title":"DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos","authors":"Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan","doi":"arxiv-2409.02095","DOIUrl":"https://doi.org/arxiv-2409.02095","url":null,"abstract":"Despite significant advancements in monocular depth estimation for static\u0000images, estimating video depth in the open world remains challenging, since\u0000open-world videos are extremely diverse in content, motion, camera movement,\u0000and length. We present DepthCrafter, an innovative method for generating\u0000temporally consistent long depth sequences with intricate details for\u0000open-world videos, without requiring any supplementary information such as\u0000camera poses or optical flow. DepthCrafter achieves generalization ability to\u0000open-world videos by training a video-to-depth model from a pre-trained\u0000image-to-video diffusion model, through our meticulously designed three-stage\u0000training strategy with the compiled paired video-depth datasets. Our training\u0000approach enables the model to generate depth sequences with variable lengths at\u0000one time, up to 110 frames, and harvest both precise depth details and rich\u0000content diversity from realistic and synthetic datasets. We also propose an\u0000inference strategy that processes extremely long videos through segment-wise\u0000estimation and seamless stitching. Comprehensive evaluations on multiple\u0000datasets reveal that DepthCrafter achieves state-of-the-art performance in\u0000open-world video depth estimation under zero-shot settings. Furthermore,\u0000DepthCrafter facilitates various downstream applications, including depth-based\u0000visual effects and conditional video generation.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"175 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Motion Synthesis: Masked Audio-Text Conditioned Spatio-Temporal Transformers 动态运动合成：屏蔽音频文本条件时空变换器

arXiv - CS - Graphics

Pub Date : 2024-09-03 DOI: arxiv-2409.01591

Sohan Anisetty, James Hays

Our research presents a novel motion generation framework designed to producewhole-body motion sequences conditioned on multiple modalities simultaneously,specifically text and audio inputs. Leveraging Vector Quantized VariationalAutoencoders (VQVAEs) for motion discretization and a bidirectional MaskedLanguage Modeling (MLM) strategy for efficient token prediction, our approachachieves improved processing efficiency and coherence in the generated motions.By integrating spatial attention mechanisms and a token critic we ensureconsistency and naturalness in the generated motions. This framework expandsthe possibilities of motion generation, addressing the limitations of existingapproaches and opening avenues for multimodal motion synthesis.

我们的研究提出了一种新颖的运动生成框架，旨在同时根据多种模式（特别是文本和音频输入）生成全身运动序列。我们的方法利用矢量量化变异自动编码器（VQVAE）进行运动离散化，并利用双向屏蔽语言建模（MLM）策略进行高效标记预测，从而提高了处理效率和生成运动的一致性。这一框架拓展了动作生成的可能性，解决了现有方法的局限性，为多模态动作合成开辟了道路。

引用次数: 0

AMG: Avatar Motion Guided Video Generation AMG：阿凡达动作引导视频生成器

arXiv - CS - Graphics

Pub Date : 2024-09-02 DOI: arxiv-2409.01502

Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang

Human video generation task has gained significant attention with theadvancement of deep generative models. Generating realistic videos with humanmovements is challenging in nature, due to the intricacies of human bodytopology and sensitivity to visual artifacts. The extensively studied 2D mediageneration methods take advantage of massive human media datasets, but strugglewith 3D-aware control; whereas 3D avatar-based approaches, while offering morefreedom in control, lack photorealism and cannot be harmonized seamlessly withbackground scene. We propose AMG, a method that combines the 2D photorealismand 3D controllability by conditioning video diffusion models on controlledrendering of 3D avatars. We additionally introduce a novel data processingpipeline that reconstructs and renders human avatar movements from dynamiccamera videos. AMG is the first method that enables multi-person diffusionvideo generation with precise control over camera positions, human motions, andbackground style. We also demonstrate through extensive evaluation that itoutperforms existing human video generation methods conditioned on posesequences or driving videos in terms of realism and adaptability.

随着深度生成模型的发展，人类视频生成任务获得了极大关注。由于人体结构的复杂性和对视觉伪影的敏感性，生成逼真的人体动作视频本质上具有挑战性。已被广泛研究的二维媒体生成方法利用了海量人类媒体数据集的优势，但在三维感知控制方面却举步维艰；而基于三维头像的方法虽然提供了更大的控制自由度，但却缺乏逼真度，无法与背景场景无缝协调。我们提出了 AMG 方法，通过在三维头像的控制渲染中调节视频扩散模型，将二维逼真度和三维可控性结合起来。此外，我们还引入了一种新颖的数据处理管道，可从动态摄像机视频中重建和渲染人类头像的动作。AMG 是第一种能精确控制摄像机位置、人体运动和背景风格的多人扩散视频生成方法。我们还通过广泛的评估证明，AMG 在逼真度和适应性方面优于现有的以位置序列或驾驶视频为条件的人体视频生成方法。

{"title":"AMG: Avatar Motion Guided Video Generation","authors":"Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang","doi":"arxiv-2409.01502","DOIUrl":"https://doi.org/arxiv-2409.01502","url":null,"abstract":"Human video generation task has gained significant attention with the\u0000advancement of deep generative models. Generating realistic videos with human\u0000movements is challenging in nature, due to the intricacies of human body\u0000topology and sensitivity to visual artifacts. The extensively studied 2D media\u0000generation methods take advantage of massive human media datasets, but struggle\u0000with 3D-aware control; whereas 3D avatar-based approaches, while offering more\u0000freedom in control, lack photorealism and cannot be harmonized seamlessly with\u0000background scene. We propose AMG, a method that combines the 2D photorealism\u0000and 3D controllability by conditioning video diffusion models on controlled\u0000rendering of 3D avatars. We additionally introduce a novel data processing\u0000pipeline that reconstructs and renders human avatar movements from dynamic\u0000camera videos. AMG is the first method that enables multi-person diffusion\u0000video generation with precise control over camera positions, human motions, and\u0000background style. We also demonstrate through extensive evaluation that it\u0000outperforms existing human video generation methods conditioned on pose\u0000sequences or driving videos in terms of realism and adaptability.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"136 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DiffCSG: Differentiable CSG via Rasterization DiffCSG：通过光栅化实现可微分 CSG

arXiv - CS - Graphics

Pub Date : 2024-09-02 DOI: arxiv-2409.01421

Haocheng Yuan, Adrien Bousseau, Hao Pan, Chengquan Zhang, Niloy J. Mitra, Changjian Li

Differentiable rendering is a key ingredient for inverse rendering andmachine learning, as it allows to optimize scene parameters (shape, materials,lighting) to best fit target images. Differentiable rendering requires thateach scene parameter relates to pixel values through differentiable operations.While 3D mesh rendering algorithms have been implemented in a differentiableway, these algorithms do not directly extend to Constructive-Solid-Geometry(CSG), a popular parametric representation of shapes, because the underlyingboolean operations are typically performed with complex black-boxmesh-processing libraries. We present an algorithm, DiffCSG, to render CSGmodels in a differentiable manner. Our algorithm builds upon CSG rasterization,which displays the result of boolean operations between primitives withoutexplicitly computing the resulting mesh and, as such, bypasses black-box meshprocessing. We describe how to implement CSG rasterization within adifferentiable rendering pipeline, taking special care to apply antialiasingalong primitive intersections to obtain gradients in such critical areas. Ouralgorithm is simple and fast, can be easily incorporated into modern machinelearning setups, and enables a range of applications for computer-aided design,including direct and image-based editing of CSG primitives. Code and data:https://yyyyyhc.github.io/DiffCSG/.

可微分渲染是逆向渲染和机器学习的关键要素，因为它可以优化场景参数（形状、材料、照明），使其最适合目标图像。可微分渲染要求每个场景参数通过可微分运算与像素值相关联。虽然三维网格渲染算法已经以可微分方式实现，但这些算法并不能直接扩展到构造-实体-几何（CSG）--一种流行的形状参数化表示法，因为底层布尔运算通常是通过复杂的黑盒网格处理库来执行的。我们提出了一种以可微分方式渲染 CSG 模型的算法 DiffCSG。我们的算法建立在 CSG 栅格化的基础上，它显示基元之间布尔运算的结果，而不明确计算所产生的网格，因此绕过了黑盒网格处理。我们介绍了如何在可区分的渲染流水线中实现 CSG 光栅化，并特别注意在基元交叉处应用抗锯齿技术，以在这些关键区域获得梯度。该算法简单快速，可轻松集成到现代机器学习设置中，可用于计算机辅助设计的一系列应用，包括 CSG 基元的直接编辑和基于图像的编辑。代码和数据：https://yyyyyhc.github.io/DiffCSG/。

{"title":"DiffCSG: Differentiable CSG via Rasterization","authors":"Haocheng Yuan, Adrien Bousseau, Hao Pan, Chengquan Zhang, Niloy J. Mitra, Changjian Li","doi":"arxiv-2409.01421","DOIUrl":"https://doi.org/arxiv-2409.01421","url":null,"abstract":"Differentiable rendering is a key ingredient for inverse rendering and\u0000machine learning, as it allows to optimize scene parameters (shape, materials,\u0000lighting) to best fit target images. Differentiable rendering requires that\u0000each scene parameter relates to pixel values through differentiable operations.\u0000While 3D mesh rendering algorithms have been implemented in a differentiable\u0000way, these algorithms do not directly extend to Constructive-Solid-Geometry\u0000(CSG), a popular parametric representation of shapes, because the underlying\u0000boolean operations are typically performed with complex black-box\u0000mesh-processing libraries. We present an algorithm, DiffCSG, to render CSG\u0000models in a differentiable manner. Our algorithm builds upon CSG rasterization,\u0000which displays the result of boolean operations between primitives without\u0000explicitly computing the resulting mesh and, as such, bypasses black-box mesh\u0000processing. We describe how to implement CSG rasterization within a\u0000differentiable rendering pipeline, taking special care to apply antialiasing\u0000along primitive intersections to obtain gradients in such critical areas. Our\u0000algorithm is simple and fast, can be easily incorporated into modern machine\u0000learning setups, and enables a range of applications for computer-aided design,\u0000including direct and image-based editing of CSG primitives. Code and data:\u0000https://yyyyyhc.github.io/DiffCSG/.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Curvy: A Parametric Cross-section based Surface Reconstruction Curvy：基于参数截面的曲面重构

arXiv - CS - Graphics

Pub Date : 2024-09-01 DOI: arxiv-2409.00829

Aradhya N. Mathur, Apoorv Khattar, Ojaswa Sharma

In this work, we present a novel approach for reconstructing shape pointclouds using planar sparse cross-sections with the help of generative modeling.We present unique challenges pertaining to the representation andreconstruction in this problem setting. Most methods in the classicalliterature lack the ability to generalize based on object class and employcomplex mathematical machinery to reconstruct reliable surfaces. We present asimple learnable approach to generate a large number of points from a smallnumber of input cross-sections over a large dataset. We use a compactparametric polyline representation using adaptive splitting to represent thecross-sections and perform learning using a Graph Neural Network to reconstructthe underlying shape in an adaptive manner reducing the dependence on thenumber of cross-sections provided.

在这项工作中，我们提出了一种借助生成模型使用平面稀疏截面重建形状点云的新方法。经典文献中的大多数方法都缺乏根据对象类别进行泛化的能力，并且采用复杂的数学机制来重建可靠的曲面。我们提出了一种简单易学的方法，可从大量数据集上的少量输入横截面生成大量点。我们使用自适应分割的紧凑参数多线表示法来表示横截面，并使用图神经网络进行学习，以自适应的方式重建底层形状，从而减少对所提供横截面数量的依赖。

引用次数: 0

GroomCap: High-Fidelity Prior-Free Hair Capture GroomCap：高保真无前发捕捉功能

arXiv - CS - Graphics

Pub Date : 2024-09-01 DOI: arxiv-2409.00831

Yuxiao Zhou, Menglei Chai, Daoye Wang, Sebastian Winberg, Erroll Wood, Kripasindhu Sarkar, Markus Gross, Thabo Beeler

Despite recent advances in multi-view hair reconstruction, achievingstrand-level precision remains a significant challenge due to inherentlimitations in existing capture pipelines. We introduce GroomCap, a novelmulti-view hair capture method that reconstructs faithful and high-fidelityhair geometry without relying on external data priors. To address thelimitations of conventional reconstruction algorithms, we propose a neuralimplicit representation for hair volume that encodes high-resolution 3Dorientation and occupancy from input views. This implicit hair volume istrained with a new volumetric 3D orientation rendering algorithm, coupled with2D orientation distribution supervision, to effectively prevent the loss ofstructural information caused by undesired orientation blending. We furtherpropose a Gaussian-based hair optimization strategy to refine the traced hairstrands with a novel chained Gaussian representation, utilizing directphotometric supervision from images. Our results demonstrate that GroomCap isable to capture high-quality hair geometries that are not only more precise anddetailed than existing methods but also versatile enough for a range ofapplications.

尽管最近在多视角头发重建方面取得了进展，但由于现有捕捉管道的固有局限性，要实现最高级别的精确度仍然是一项重大挑战。我们介绍的 GroomCap 是一种新颖的多视角头发捕捉方法，它能重建忠实、高保真的头发几何形状，而无需依赖外部数据先验。为了解决传统重建算法的局限性，我们提出了一种头发体积的神经隐式表示法，它能对输入视图中的高分辨率三维方向和占位进行编码。这种隐式发量用一种新的体积三维方位渲染算法进行训练，并结合二维方位分布监督，以有效防止因不希望的方位混合而造成的结构信息丢失。我们进一步提出了一种基于高斯的发型优化策略，利用图像的直接光度监督，以一种新颖的链式高斯表示法来完善追踪的发型。我们的研究结果表明，GroomCap 能够捕捉到高质量的头发几何图形，不仅比现有方法更精确、更细致，而且用途广泛，适用于各种应用。

{"title":"GroomCap: High-Fidelity Prior-Free Hair Capture","authors":"Yuxiao Zhou, Menglei Chai, Daoye Wang, Sebastian Winberg, Erroll Wood, Kripasindhu Sarkar, Markus Gross, Thabo Beeler","doi":"arxiv-2409.00831","DOIUrl":"https://doi.org/arxiv-2409.00831","url":null,"abstract":"Despite recent advances in multi-view hair reconstruction, achieving\u0000strand-level precision remains a significant challenge due to inherent\u0000limitations in existing capture pipelines. We introduce GroomCap, a novel\u0000multi-view hair capture method that reconstructs faithful and high-fidelity\u0000hair geometry without relying on external data priors. To address the\u0000limitations of conventional reconstruction algorithms, we propose a neural\u0000implicit representation for hair volume that encodes high-resolution 3D\u0000orientation and occupancy from input views. This implicit hair volume is\u0000trained with a new volumetric 3D orientation rendering algorithm, coupled with\u00002D orientation distribution supervision, to effectively prevent the loss of\u0000structural information caused by undesired orientation blending. We further\u0000propose a Gaussian-based hair optimization strategy to refine the traced hair\u0000strands with a novel chained Gaussian representation, utilizing direct\u0000photometric supervision from images. Our results demonstrate that GroomCap is\u0000able to capture high-quality hair geometries that are not only more precise and\u0000detailed than existing methods but also versatile enough for a range of\u0000applications.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mastoidectomy Multi-View Synthesis from a Single Microscopy Image 从单张显微镜图像合成乳突切除术多视角图像

arXiv - CS - Graphics

Pub Date : 2024-08-31 DOI: arxiv-2409.03190

Yike Zhang, Jack Noble

Cochlear Implant (CI) procedures involve performing an invasive mastoidectomyto insert an electrode array into the cochlea. In this paper, we introduce anovel pipeline that is capable of generating synthetic multi-view videos from asingle CI microscope image. In our approach, we use a patient's pre-operativeCT scan to predict the post-mastoidectomy surface using a method designed forthis purpose. We manually align the surface with a selected microscope frame toobtain an accurate initial pose of the reconstructed CT mesh relative to themicroscope. We then perform UV projection to transfer the colors from the frameto surface textures. Novel views of the textured surface can be used togenerate a large dataset of synthetic frames with ground truth poses. Weevaluated the quality of synthetic views rendered using Pytorch3D and PyVista.We found both rendering engines lead to similarly high-quality syntheticnovel-view frames compared to ground truth with a structural similarity indexfor both methods averaging about 0.86. A large dataset of novel views withknown poses is critical for ongoing training of a method to automaticallyestimate microscope pose for 2D to 3D registration with the pre-operative CT tofacilitate augmented reality surgery. This dataset will empower variousdownstream tasks, such as integrating Augmented Reality (AR) in the OR,tracking surgical tools, and supporting other video analysis studies.

人工耳蜗植入（CI）手术需要进行侵入性乳突切除术，以便将电极阵列插入耳蜗。在本文中，我们介绍了一种能够从单个 CI 显微镜图像生成合成多视角视频的新流水线。在我们的方法中，我们利用患者术前的 CT 扫描，通过一种专门为此设计的方法来预测乳突切除术后的表面。我们用手动方式将表面与选定的显微镜框架对齐，以获得重建 CT 网格相对于显微镜的精确初始姿态。然后，我们进行 UV 投影，将色彩从框架转移到表面纹理。纹理表面的新视图可用于生成具有地面实况姿态的大型合成帧数据集。我们评估了使用 Pytorch3D 和 PyVista 渲染的合成视图的质量，发现这两种渲染引擎生成的合成视图帧的质量与地面实况相似，两种方法的结构相似度指数平均约为 0.86。大量具有已知姿势的新颖视图数据集对于不断训练自动估计显微镜姿势的方法至关重要，以便与术前 CT 进行二维到三维配准，从而促进增强现实手术。该数据集将支持各种下游任务，例如在手术室整合增强现实技术（AR）、跟踪手术工具以及支持其他视频分析研究。

{"title":"Mastoidectomy Multi-View Synthesis from a Single Microscopy Image","authors":"Yike Zhang, Jack Noble","doi":"arxiv-2409.03190","DOIUrl":"https://doi.org/arxiv-2409.03190","url":null,"abstract":"Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy\u0000to insert an electrode array into the cochlea. In this paper, we introduce a\u0000novel pipeline that is capable of generating synthetic multi-view videos from a\u0000single CI microscope image. In our approach, we use a patient's pre-operative\u0000CT scan to predict the post-mastoidectomy surface using a method designed for\u0000this purpose. We manually align the surface with a selected microscope frame to\u0000obtain an accurate initial pose of the reconstructed CT mesh relative to the\u0000microscope. We then perform UV projection to transfer the colors from the frame\u0000to surface textures. Novel views of the textured surface can be used to\u0000generate a large dataset of synthetic frames with ground truth poses. We\u0000evaluated the quality of synthetic views rendered using Pytorch3D and PyVista.\u0000We found both rendering engines lead to similarly high-quality synthetic\u0000novel-view frames compared to ground truth with a structural similarity index\u0000for both methods averaging about 0.86. A large dataset of novel views with\u0000known poses is critical for ongoing training of a method to automatically\u0000estimate microscope pose for 2D to 3D registration with the pre-operative CT to\u0000facilitate augmented reality surgery. This dataset will empower various\u0000downstream tasks, such as integrating Augmented Reality (AR) in the OR,\u0000tracking surgical tools, and supporting other video analysis studies.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0