arXiv - CS - Graphics最新文献_第8页

Sampling for View Synthesis: From Local Light Field Fusion to Neural Radiance Fields and Beyond 视图合成采样：从局部光场融合到神经辐射场及其他

arXiv - CS - Graphics

Pub Date : 2024-08-08 DOI: arxiv-2408.04586

Ravi Ramamoorthi

Capturing and rendering novel views of complex real-world scenes is along-standing problem in computer graphics and vision, with applications inaugmented and virtual reality, immersive experiences and 3D photography. Theadvent of deep learning has enabled revolutionary advances in this area,classically known as image-based rendering. However, previous approachesrequire intractably dense view sampling or provide little or no guidance forhow users should sample views of a scene to reliably render high-quality novelviews. Local light field fusion proposes an algorithm for practical viewsynthesis from an irregular grid of sampled views that first expands eachsampled view into a local light field via a multiplane image scenerepresentation, then renders novel views by blending adjacent local lightfields. Crucially, we extend traditional plenoptic sampling theory to derive abound that specifies precisely how densely users should sample views of a givenscene when using our algorithm. We achieve the perceptual quality of Nyquistrate view sampling while using up to 4000x fewer views. Subsequent developmentshave led to new scene representations for deep learning with view synthesis,notably neural radiance fields, but the problem of sparse view synthesis from asmall number of images has only grown in importance. We reprise some of therecent results on sparse and even single image view synthesis, while posing thequestion of whether prescriptive sampling guidelines are feasible for the newgeneration of image-based rendering algorithms.

捕捉和渲染复杂现实世界场景的新颖视图是计算机图形学和视觉领域的长期难题，在增强现实和虚拟现实、沉浸式体验和三维摄影中都有应用。深度学习的出现使这一领域取得了革命性的进展，即经典的基于图像的渲染。然而，以前的方法需要高密度的视图采样，或者对于用户如何采样场景视图以可靠地渲染高质量的新视图几乎没有提供任何指导。局部光场融合提出了一种从不规则网格采样视图中合成实用视图的算法，该算法首先通过多平面图像场景呈现将每个采样视图扩展为一个局部光场，然后通过混合相邻的局部光场来渲染新视图。最重要的是，我们扩展了传统的全光景采样理论，从而得出了一套精确的方法，规定了用户在使用我们的算法时，对给定场景的视图进行采样的密度。我们实现了 Nyquistrate 视图采样的感知质量，而使用的视图数量却减少了 4000 倍。随后的发展为深度学习的视图合成带来了新的场景表示法，特别是神经辐射场，但从少量图像中进行稀疏视图合成的问题却越来越重要。我们重述了近期在稀疏甚至单张图像视图合成方面取得的一些成果，同时提出了一个问题：对于新一代基于图像的渲染算法来说，规范性的采样准则是否可行？

{"title":"Sampling for View Synthesis: From Local Light Field Fusion to Neural Radiance Fields and Beyond","authors":"Ravi Ramamoorthi","doi":"arxiv-2408.04586","DOIUrl":"https://doi.org/arxiv-2408.04586","url":null,"abstract":"Capturing and rendering novel views of complex real-world scenes is a\u0000long-standing problem in computer graphics and vision, with applications in\u0000augmented and virtual reality, immersive experiences and 3D photography. The\u0000advent of deep learning has enabled revolutionary advances in this area,\u0000classically known as image-based rendering. However, previous approaches\u0000require intractably dense view sampling or provide little or no guidance for\u0000how users should sample views of a scene to reliably render high-quality novel\u0000views. Local light field fusion proposes an algorithm for practical view\u0000synthesis from an irregular grid of sampled views that first expands each\u0000sampled view into a local light field via a multiplane image scene\u0000representation, then renders novel views by blending adjacent local light\u0000fields. Crucially, we extend traditional plenoptic sampling theory to derive a\u0000bound that specifies precisely how densely users should sample views of a given\u0000scene when using our algorithm. We achieve the perceptual quality of Nyquist\u0000rate view sampling while using up to 4000x fewer views. Subsequent developments\u0000have led to new scene representations for deep learning with view synthesis,\u0000notably neural radiance fields, but the problem of sparse view synthesis from a\u0000small number of images has only grown in importance. We reprise some of the\u0000recent results on sparse and even single image view synthesis, while posing the\u0000question of whether prescriptive sampling guidelines are feasible for the new\u0000generation of image-based rendering algorithms.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches Sketch2Scene：从用户的休闲草图自动生成交互式 3D 游戏场景

arXiv - CS - Graphics

Pub Date : 2024-08-08 DOI: arxiv-2408.04567

Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li

3D Content Generation is at the heart of many computer graphics applications,including video gaming, film-making, virtual and augmented reality, etc. Thispaper proposes a novel deep-learning based approach for automaticallygenerating interactive and playable 3D game scenes, all from the user's casualprompts such as a hand-drawn sketch. Sketch-based input offers a natural, andconvenient way to convey the user's design intention in the content creationprocess. To circumvent the data-deficient challenge in learning (i.e. the lackof large training data of 3D scenes), our method leverages a pre-trained 2Ddenoising diffusion model to generate a 2D image of the scene as the conceptualguidance. In this process, we adopt the isometric projection mode to factor outunknown camera poses while obtaining the scene layout. From the generatedisometric image, we use a pre-trained image understanding method to segment theimage into meaningful parts, such as off-ground objects, trees, and buildings,and extract the 2D scene layout. These segments and layouts are subsequentlyfed into a procedural content generation (PCG) engine, such as a 3D video gameengine like Unity or Unreal, to create the 3D scene. The resulting 3D scene canbe seamlessly integrated into a game development environment and is readilyplayable. Extensive tests demonstrate that our method can efficiently generatehigh-quality and interactive 3D game scenes with layouts that closely followthe user's intention.

三维内容生成是视频游戏、电影制作、虚拟现实和增强现实等众多计算机图形应用的核心。本文提出了一种新颖的基于深度学习的方法，用于自动生成可交互和可播放的 3D 游戏场景，所有这些都来自用户的随意素描（如手绘草图）。基于草图的输入为在内容创建过程中传达用户的设计意图提供了一种自然、便捷的方式。为了规避学习过程中数据不足的难题（即缺乏大量三维场景的训练数据），我们的方法利用预先训练好的二维去噪扩散模型生成二维场景图像作为概念指导。在这一过程中，我们采用等距投影模式，在获取场景布局的同时，将未知的摄像机姿势因素排除在外。从生成的等轴测图像中，我们使用预先训练好的图像理解方法将图像分割成有意义的部分，例如离地物体、树木和建筑物，并提取二维场景布局。这些分割和布局随后被输入程序内容生成（PCG）引擎，如 Unity 或 Unreal 等 3D 视频游戏引擎，以创建 3D 场景。生成的三维场景可以无缝集成到游戏开发环境中，并且可以随时播放。广泛的测试表明，我们的方法可以高效地生成高质量的交互式三维游戏场景，其布局紧贴用户的意图。

{"title":"Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches","authors":"Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li","doi":"arxiv-2408.04567","DOIUrl":"https://doi.org/arxiv-2408.04567","url":null,"abstract":"3D Content Generation is at the heart of many computer graphics applications,\u0000including video gaming, film-making, virtual and augmented reality, etc. This\u0000paper proposes a novel deep-learning based approach for automatically\u0000generating interactive and playable 3D game scenes, all from the user's casual\u0000prompts such as a hand-drawn sketch. Sketch-based input offers a natural, and\u0000convenient way to convey the user's design intention in the content creation\u0000process. To circumvent the data-deficient challenge in learning (i.e. the lack\u0000of large training data of 3D scenes), our method leverages a pre-trained 2D\u0000denoising diffusion model to generate a 2D image of the scene as the conceptual\u0000guidance. In this process, we adopt the isometric projection mode to factor out\u0000unknown camera poses while obtaining the scene layout. From the generated\u0000isometric image, we use a pre-trained image understanding method to segment the\u0000image into meaningful parts, such as off-ground objects, trees, and buildings,\u0000and extract the 2D scene layout. These segments and layouts are subsequently\u0000fed into a procedural content generation (PCG) engine, such as a 3D video game\u0000engine like Unity or Unreal, to create the 3D scene. The resulting 3D scene can\u0000be seamlessly integrated into a game development environment and is readily\u0000playable. Extensive tests demonstrate that our method can efficiently generate\u0000high-quality and interactive 3D game scenes with layouts that closely follow\u0000the user's intention.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

One-Shot Method for Computing Generalized Winding Numbers 计算广义绕组数的一次性方法

arXiv - CS - Graphics

Pub Date : 2024-08-08 DOI: arxiv-2408.04466

Cedric Martens, Mikhail Bessmeltsev

The generalized winding number is an essential part of the geometryprocessing toolkit, allowing to quantify how much a given point is inside asurface, often represented by a mesh or a point cloud, even when the surface isopen, noisy, or non-manifold. Parameterized surfaces, which often containintentional and unintentional gaps and imprecisions, would also benefit from ageneralized winding number. Standard methods to compute it, however, rely on asurface integral, challenging to compute without surface discretization,leading to loss of precision characteristic of parametric surfaces. We propose an alternative method to compute a generalized winding number,based only on the surface boundary and the intersections of a single ray withthe surface. For parametric surfaces, we show that all the necessary operationscan be done via a Sum-of-Squares (SOS) formulation, thus computing generalizedwinding numbers without surface discretization with machine precision. We showthat by discretizing only the boundary of the surface, this becomes anefficient method. We demonstrate an application of our method to the problem of computing ageneralized winding number of a surface represented by a curve network, whereeach curve loop is surfaced via Laplace equation. We use the Boundary ElementMethod to express the solution as a parametric surface, allowing us to applyour method without meshing the surfaces. As a bonus, we also demonstrate thatfor meshes with many triangles and a simple boundary, our method is faster thanthe hierarchical evaluation of the generalized winding number while still beingprecise. We validate our algorithms theoretically, numerically, and by demonstrating agallery of results new{on a variety of parametric surfaces and meshes}, aswell uses in a variety of applications, including voxelizations and booleanoperations.

广义卷绕数是几何处理工具包的重要组成部分，它可以量化给定点在曲面（通常由网格或点云表示）内部的位置，即使曲面是开放的、嘈杂的或非曲面。参数化曲面通常包含有意或无意的间隙和不精确，因此也会受益于通用的卷绕数。然而，计算它的标准方法依赖于表面积分，在没有表面离散化的情况下计算具有挑战性，导致参数化表面特有的精度损失。我们提出了一种计算广义缠绕数的替代方法，它只基于曲面边界和单条射线与曲面的交点。对于参数曲面，我们证明所有必要的运算都可以通过求和公式（SOS）完成，因此无需曲面离散化就能计算出机器精度的广义绕组数。我们证明，只对曲面的边界进行离散化，这将成为一种高效的方法。我们演示了将我们的方法应用于计算由曲线网络表示的曲面的广义缠绕数问题，其中每个曲线环通过拉普拉斯方程进行曲面化。我们使用边界元法将解表达为参数曲面，这样就可以在不对曲面进行网格划分的情况下应用我们的方法。另外，我们还证明，对于具有许多三角形和简单边界的网格，我们的方法比广义卷绕数的分层评估更快，同时仍然精确。我们从理论和数值上验证了我们的算法，并在各种参数化曲面和网格上演示了一系列新结果，以及在各种应用中的使用，包括体素化和布尔运算。

{"title":"One-Shot Method for Computing Generalized Winding Numbers","authors":"Cedric Martens, Mikhail Bessmeltsev","doi":"arxiv-2408.04466","DOIUrl":"https://doi.org/arxiv-2408.04466","url":null,"abstract":"The generalized winding number is an essential part of the geometry\u0000processing toolkit, allowing to quantify how much a given point is inside a\u0000surface, often represented by a mesh or a point cloud, even when the surface is\u0000open, noisy, or non-manifold. Parameterized surfaces, which often contain\u0000intentional and unintentional gaps and imprecisions, would also benefit from a\u0000generalized winding number. Standard methods to compute it, however, rely on a\u0000surface integral, challenging to compute without surface discretization,\u0000leading to loss of precision characteristic of parametric surfaces. We propose an alternative method to compute a generalized winding number,\u0000based only on the surface boundary and the intersections of a single ray with\u0000the surface. For parametric surfaces, we show that all the necessary operations\u0000can be done via a Sum-of-Squares (SOS) formulation, thus computing generalized\u0000winding numbers without surface discretization with machine precision. We show\u0000that by discretizing only the boundary of the surface, this becomes an\u0000efficient method. We demonstrate an application of our method to the problem of computing a\u0000generalized winding number of a surface represented by a curve network, where\u0000each curve loop is surfaced via Laplace equation. We use the Boundary Element\u0000Method to express the solution as a parametric surface, allowing us to apply\u0000our method without meshing the surfaces. As a bonus, we also demonstrate that\u0000for meshes with many triangles and a simple boundary, our method is faster than\u0000the hierarchical evaluation of the generalized winding number while still being\u0000precise. We validate our algorithms theoretically, numerically, and by demonstrating a\u0000gallery of results new{on a variety of parametric surfaces and meshes}, as\u0000well uses in a variety of applications, including voxelizations and boolean\u0000operations.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Skinning using the Mixed Finite Element Method 使用混合有限元法自动剥皮

arXiv - CS - Graphics

Pub Date : 2024-08-07 DOI: arxiv-2408.04066

Hongcheng Song, Dmitry Kachkovski, Shaimaa Monem, Abraham Kassauhun Negash, David I. W. Levin

In this work, we show that exploiting additional variables in a mixed finiteelement formulation of deformation leads to an efficient physics-basedcharacter skinning algorithm. Taking as input, a user-defined rig, we show howto efficiently compute deformations of the character mesh which respectartist-supplied handle positions and orientations, but without requiringcomplicated constraints on the physics solver, which can cause poorperformance. Rather we demonstrate an efficient, user controllable skinningpipeline that can generate compelling character deformations, using a varietyof physics material models.

在这项工作中，我们展示了在变形的混合有限元表述中利用附加变量，可实现高效的基于物理的角色换肤算法。以用户定义的装备作为输入，我们展示了如何高效地计算角色网格的变形，这些变形尊重了艺术家提供的手柄位置和方向，但不需要对物理求解器施加复杂的约束，否则会导致性能低下。相反，我们演示了一种高效的、用户可控的蒙皮管道，它可以使用各种物理材料模型生成引人注目的角色变形。

引用次数: 0

Fast Sprite Decomposition from Animated Graphics 从动画图形中快速分解精灵

arXiv - CS - Graphics

Pub Date : 2024-08-07 DOI: arxiv-2408.03923

Tomoyuki Suzuki, Kotaro Kikuchi, Kota Yamaguchi

This paper presents an approach to decomposing animated graphics intosprites, a set of basic elements or layers. Our approach builds on theoptimization of sprite parameters to fit the raster video. For efficiency, weassume static textures for sprites to reduce the search space while preventingartifacts using a texture prior model. To further speed up the optimization, weintroduce the initialization of the sprite parameters utilizing a pre-trainedvideo object segmentation model and user input of single frame annotations. Forour study, we construct the Crello Animation dataset from an online designservice and define quantitative metrics to measure the quality of the extractedsprites. Experiments show that our method significantly outperforms baselinesfor similar decomposition tasks in terms of the quality/efficiency tradeoff.

本文介绍了一种将动画图形分解为精灵（一组基本元素或图层）的方法。我们的方法建立在优化精灵参数以适应光栅视频的基础上。为了提高效率，我们假定精灵采用静态纹理，以减少搜索空间，同时使用纹理先验模型防止伪影。为了进一步加快优化速度，我们利用预先训练好的视频对象分割模型和用户输入的单帧注释来初始化精灵参数。在研究中，我们从在线设计服务中构建了 Crello 动画数据集，并定义了量化指标来衡量提取精灵的质量。实验表明，在质量/效率权衡方面，我们的方法明显优于类似分解任务的基线方法。

引用次数: 0

RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis RayGauss：基于体积高斯的光线铸造，实现逼真的新颖视图合成

arXiv - CS - Graphics

Pub Date : 2024-08-06 DOI: arxiv-2408.03356

Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic

Differentiable volumetric rendering-based methods made significant progressin novel view synthesis. On one hand, innovative methods have replaced theNeural Radiance Fields (NeRF) network with locally parameterized structures,enabling high-quality renderings in a reasonable time. On the other hand,approaches have used differentiable splatting instead of NeRF's ray casting tooptimize radiance fields rapidly using Gaussian kernels, allowing for fineadaptation to the scene. However, differentiable ray casting of irregularlyspaced kernels has been scarcely explored, while splatting, despite enablingfast rendering times, is susceptible to clearly visible artifacts. Our work closes this gap by providing a physically consistent formulation ofthe emitted radiance c and density {sigma}, decomposed with Gaussian functionsassociated with Spherical Gaussians/Harmonics for all-frequency colorimetricrepresentation. We also introduce a method enabling differentiable ray castingof irregularly distributed Gaussians using an algorithm that integratesradiance fields slab by slab and leverages a BVH structure. This allows ourapproach to finely adapt to the scene while avoiding splatting artifacts. As aresult, we achieve superior rendering quality compared to the state-of-the-artwhile maintaining reasonable training times and achieving inference speeds of25 FPS on the Blender dataset. Project page with videos and code:https://raygauss.github.io/

基于可变容积渲染的方法在新型视图合成方面取得了重大进展。一方面，创新方法用局部参数化结构取代了神经辐射场（Neural Radiance Fields，NeRF）网络，在合理的时间内实现了高质量的渲染。另一方面，一些方法使用可微分溅射代替 NeRF 的光线投射，利用高斯核快速优化辐射场，从而实现对场景的精细适应。然而，对不规则内核的可微分光线投射还鲜有探索，而溅射虽然能加快渲染时间，却容易产生明显的伪影。我们的工作填补了这一空白，提供了一种物理上一致的发射辐射度 c 和密度 {sigma} 的表述方法，并使用与球形高斯/谐波相关的高斯函数进行分解，以实现全频率的色度表示。我们还引入了一种方法，利用一种逐板集成辐射场并利用 BVH 结构的算法，对不规则分布的高斯进行可微分的射线投射。这使得我们的方法能够在避免溅射伪影的同时，精细地适应场景。因此，与最先进的技术相比，我们实现了更高的渲染质量，同时保持了合理的训练时间，并在 Blender 数据集上实现了 25 FPS 的推理速度。包含视频和代码的项目页面：https://raygauss.github.io/

{"title":"RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis","authors":"Hugo Blanc, Jean-Emmanuel Deschaud, Alexis Paljic","doi":"arxiv-2408.03356","DOIUrl":"https://doi.org/arxiv-2408.03356","url":null,"abstract":"Differentiable volumetric rendering-based methods made significant progress\u0000in novel view synthesis. On one hand, innovative methods have replaced the\u0000Neural Radiance Fields (NeRF) network with locally parameterized structures,\u0000enabling high-quality renderings in a reasonable time. On the other hand,\u0000approaches have used differentiable splatting instead of NeRF's ray casting to\u0000optimize radiance fields rapidly using Gaussian kernels, allowing for fine\u0000adaptation to the scene. However, differentiable ray casting of irregularly\u0000spaced kernels has been scarcely explored, while splatting, despite enabling\u0000fast rendering times, is susceptible to clearly visible artifacts. Our work closes this gap by providing a physically consistent formulation of\u0000the emitted radiance c and density {sigma}, decomposed with Gaussian functions\u0000associated with Spherical Gaussians/Harmonics for all-frequency colorimetric\u0000representation. We also introduce a method enabling differentiable ray casting\u0000of irregularly distributed Gaussians using an algorithm that integrates\u0000radiance fields slab by slab and leverages a BVH structure. This allows our\u0000approach to finely adapt to the scene while avoiding splatting artifacts. As a\u0000result, we achieve superior rendering quality compared to the state-of-the-art\u0000while maintaining reasonable training times and achieving inference speeds of\u000025 FPS on the Blender dataset. Project page with videos and code:\u0000https://raygauss.github.io/","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion 一个物体价值 64x64 像素通过图像扩散生成 3D 物体

arXiv - CS - Graphics

Pub Date : 2024-08-06 DOI: arxiv-2408.03178

Xingguang Yan, Han-Hung Lee, Ziyu Wan, Angel X. Chang

We introduce a new approach for generating realistic 3D models with UV mapsthrough a representation termed "Object Images." This approach encapsulatessurface geometry, appearance, and patch structures within a 64x64 pixel image,effectively converting complex 3D shapes into a more manageable 2D format. Bydoing so, we address the challenges of both geometric and semantic irregularityinherent in polygonal meshes. This method allows us to use image generationmodels, such as Diffusion Transformers, directly for 3D shape generation.Evaluated on the ABO dataset, our generated shapes with patch structuresachieve point cloud FID comparable to recent 3D generative models, whilenaturally supporting PBR material generation.

我们介绍了一种通过 "对象图像 "表示法生成具有 UV 贴图的逼真 3D 模型的新方法。这种方法将表面几何形状、外观和补丁结构封装在 64x64 像素的图像中，有效地将复杂的三维形状转换为更易于管理的二维格式。通过这种方法，我们解决了多边形网格固有的几何和语义不规则性难题。通过在 ABO 数据集上进行评估，我们生成的带有补丁结构的形状的点云 FID 值与最新的三维生成模型相当，同时还支持 PBR 材质生成。

引用次数: 0

MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images MGFs：基于多视图图像的用于网格构建的屏蔽高斯场

arXiv - CS - Graphics

Pub Date : 2024-08-06 DOI: arxiv-2408.03060

Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang

Over the last few decades, image-based building surface reconstruction hasgarnered substantial research interest and has been applied across variousfields, such as heritage preservation, architectural planning, etc. Compared tothe traditional photogrammetric and NeRF-based solutions, recently, Gaussianfields-based methods have exhibited significant potential in generating surfacemeshes due to their time-efficient training and detailed 3D informationpreservation. However, most gaussian fields-based methods are trained with allimage pixels, encompassing building and nonbuilding areas, which results in asignificant noise for building meshes and degeneration in time efficiency. Thispaper proposes a novel framework, Masked Gaussian Fields (MGFs), designed togenerate accurate surface reconstruction for building in a time-efficient way.The framework first applies EfficientSAM and COLMAP to generate multi-levelmasks of building and the corresponding masked point clouds. Subsequently, themasked gaussian fields are trained by integrating two innovative losses: amulti-level perceptual masked loss focused on constructing building regions anda boundary loss aimed at enhancing the details of the boundaries betweendifferent masks. Finally, we improve the tetrahedral surface mesh extractionmethod based on the masked gaussian spheres. Comprehensive experiments on UAVimages demonstrate that, compared to the traditional method and severalNeRF-based and Gaussian-based SOTA solutions, our approach significantlyimproves both the accuracy and efficiency of building surface reconstruction.Notably, as a byproduct, there is an additional gain in the novel viewsynthesis of building.

在过去几十年里，基于图像的建筑表面重建获得了广泛的研究兴趣，并被应用于遗产保护、建筑规划等多个领域。与传统的摄影测量和基于 NeRF 的解决方案相比，最近，基于高斯场的方法因其高效的训练和详细的三维信息保存，在生成表面轮廓方面展现出了巨大的潜力。然而，大多数基于高斯场的方法都是用所有图像像素（包括建筑物和非建筑物区域）进行训练的，这导致建筑物网格的噪声显著，时间效率下降。该框架首先应用 EfficientSAM 和 COLMAP 生成建筑物的多级掩模和相应的掩模点云。随后，通过整合两种创新损失来训练这些掩蔽高斯场：一种是多层次感知掩蔽损失，侧重于构建建筑区域；另一种是边界损失，旨在增强不同掩蔽之间的边界细节。最后，我们改进了基于遮蔽高斯球的四面体表面网格提取方法。在无人机图像上进行的综合实验表明，与传统方法和几种基于 NeRF 和高斯的 SOTA 解决方案相比，我们的方法显著提高了建筑物表面重建的精度和效率。

{"title":"MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images","authors":"Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang","doi":"arxiv-2408.03060","DOIUrl":"https://doi.org/arxiv-2408.03060","url":null,"abstract":"Over the last few decades, image-based building surface reconstruction has\u0000garnered substantial research interest and has been applied across various\u0000fields, such as heritage preservation, architectural planning, etc. Compared to\u0000the traditional photogrammetric and NeRF-based solutions, recently, Gaussian\u0000fields-based methods have exhibited significant potential in generating surface\u0000meshes due to their time-efficient training and detailed 3D information\u0000preservation. However, most gaussian fields-based methods are trained with all\u0000image pixels, encompassing building and nonbuilding areas, which results in a\u0000significant noise for building meshes and degeneration in time efficiency. This\u0000paper proposes a novel framework, Masked Gaussian Fields (MGFs), designed to\u0000generate accurate surface reconstruction for building in a time-efficient way.\u0000The framework first applies EfficientSAM and COLMAP to generate multi-level\u0000masks of building and the corresponding masked point clouds. Subsequently, the\u0000masked gaussian fields are trained by integrating two innovative losses: a\u0000multi-level perceptual masked loss focused on constructing building regions and\u0000a boundary loss aimed at enhancing the details of the boundaries between\u0000different masks. Finally, we improve the tetrahedral surface mesh extraction\u0000method based on the masked gaussian spheres. Comprehensive experiments on UAV\u0000images demonstrate that, compared to the traditional method and several\u0000NeRF-based and Gaussian-based SOTA solutions, our approach significantly\u0000improves both the accuracy and efficiency of building surface reconstruction.\u0000Notably, as a byproduct, there is an additional gain in the novel view\u0000synthesis of building.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes 几何代数与大型语言模型：三维、交互式和可控场景中基于指令的独立网格变换

arXiv - CS - Graphics

Pub Date : 2024-08-05 DOI: arxiv-2408.02275

Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis

This paper introduces a novel integration of Large Language Models (LLMs)with Conformal Geometric Algebra (CGA) to revolutionize controllable 3D sceneediting, particularly for object repositioning tasks, which traditionallyrequires intricate manual processes and specialized expertise. Theseconventional methods typically suffer from reliance on large training datasetsor lack a formalized language for precise edits. Utilizing CGA as a robustformal language, our system, shenlong, precisely models spatial transformationsnecessary for accurate object repositioning. Leveraging the zero-shot learningcapabilities of pre-trained LLMs, shenlong translates natural languageinstructions into CGA operations which are then applied to the scene,facilitating exact spatial transformations within 3D scenes without the needfor specialized pre-training. Implemented in a realistic simulationenvironment, shenlong ensures compatibility with existing graphics pipelines.To accurately assess the impact of CGA, we benchmark against robust EuclideanSpace baselines, evaluating both latency and accuracy. Comparative performanceevaluations indicate that shenlong significantly reduces LLM response times by16% and boosts success rates by 9.6% on average compared to the traditionalmethods. Notably, shenlong achieves a 100% perfect success rate in commonpractical queries, a benchmark where other systems fall short. Theseadvancements underscore shenlong's potential to democratize 3D scene editing,enhancing accessibility and fostering innovation across sectors such aseducation, digital entertainment, and virtual reality.

本文介绍了大型语言模型（LLMs）与共形几何代数（CGA）的新型集成，以彻底改变可控三维场景编辑，尤其是物体重新定位任务，而传统的三维场景编辑需要复杂的手工流程和专业知识。这些传统方法通常依赖于大量的训练数据集，或缺乏用于精确编辑的正规化语言。我们的系统 "神龙 "利用 CGA 作为一种强大的形式化语言，对精确物体重新定位所需的空间变换进行精确建模。利用预先训练的 LLM 的零点学习能力，Shenlong 可将自然语言指令转化为 CGA 操作，然后将其应用到场景中，从而在三维场景中实现精确的空间转换，而无需进行专门的预先训练。为了准确评估 CGA 的影响，我们以稳健的欧几里得空间基线为基准，对延迟和准确性进行了评估。性能比较评估结果表明，与传统方法相比，神龙公司的 LLM 响应时间显著缩短了 16%，成功率平均提高了 9.6%。值得注意的是，神龙系统在常见的实际查询中达到了 100% 的完美成功率，而其他系统在这一基准上还存在不足。这些进步凸显了神龙公司在三维场景编辑民主化方面的潜力，提高了可访问性，促进了教育、数字娱乐和虚拟现实等领域的创新。

{"title":"Geometric Algebra Meets Large Language Models: Instruction-Based Transformations of Separate Meshes in 3D, Interactive and Controllable Scenes","authors":"Dimitris Angelis, Prodromos Kolyvakis, Manos Kamarianakis, George Papagiannakis","doi":"arxiv-2408.02275","DOIUrl":"https://doi.org/arxiv-2408.02275","url":null,"abstract":"This paper introduces a novel integration of Large Language Models (LLMs)\u0000with Conformal Geometric Algebra (CGA) to revolutionize controllable 3D scene\u0000editing, particularly for object repositioning tasks, which traditionally\u0000requires intricate manual processes and specialized expertise. These\u0000conventional methods typically suffer from reliance on large training datasets\u0000or lack a formalized language for precise edits. Utilizing CGA as a robust\u0000formal language, our system, shenlong, precisely models spatial transformations\u0000necessary for accurate object repositioning. Leveraging the zero-shot learning\u0000capabilities of pre-trained LLMs, shenlong translates natural language\u0000instructions into CGA operations which are then applied to the scene,\u0000facilitating exact spatial transformations within 3D scenes without the need\u0000for specialized pre-training. Implemented in a realistic simulation\u0000environment, shenlong ensures compatibility with existing graphics pipelines.\u0000To accurately assess the impact of CGA, we benchmark against robust Euclidean\u0000Space baselines, evaluating both latency and accuracy. Comparative performance\u0000evaluations indicate that shenlong significantly reduces LLM response times by\u000016% and boosts success rates by 9.6% on average compared to the traditional\u0000methods. Notably, shenlong achieves a 100% perfect success rate in common\u0000practical queries, a benchmark where other systems fall short. These\u0000advancements underscore shenlong's potential to democratize 3D scene editing,\u0000enhancing accessibility and fostering innovation across sectors such as\u0000education, digital entertainment, and virtual reality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements SceneMotifCoder：生成三维物体排列的示例驱动可视化程序学习

arXiv - CS - Graphics

Pub Date : 2024-08-05 DOI: arxiv-2408.02211

Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

Despite advances in text-to-3D generation methods, generation of multi-objectarrangements remains challenging. Current methods exhibit failures ingenerating physically plausible arrangements that respect the provided textdescription. We present SceneMotifCoder (SMC), an example-driven framework forgenerating 3D object arrangements through visual program learning. SMCleverages large language models (LLMs) and program synthesis to overcome thesechallenges by learning visual programs from example arrangements. Theseprograms are generalized into compact, editable meta-programs. When combinedwith 3D object retrieval and geometry-aware optimization, they can be used tocreate object arrangements varying in arrangement structure and containedobjects. Our experiments show that SMC generates high-quality arrangementsusing meta-programs learned from few examples. Evaluation results demonstratesthat object arrangements generated by SMC better conform to user-specified textdescriptions and are more physically plausible when compared withstate-of-the-art text-to-3D generation and layout methods.

尽管文本到三维的生成方法取得了进步，但生成多对象排列仍然充满挑战。目前的方法无法生成符合所提供文本描述的物理上可信的排列。我们提出了 SceneMotifCoder（SMC），这是一个通过视觉程序学习生成三维物体排列的示例驱动框架。SMC 利用大型语言模型（LLM）和程序合成，通过从示例排列中学习视觉程序来克服这些挑战。这些程序被归纳为紧凑、可编辑的元程序。当与三维物体检索和几何感知优化相结合时，它们可用于创建在排列结构和所含物体方面各不相同的物体排列。我们的实验表明，SMC 可以利用从少数示例中学到的元程序生成高质量的排列。评估结果表明，与最先进的文本到三维生成和布局方法相比，SMC 生成的对象排列更符合用户指定的文本描述，在物理上也更加合理。

{"title":"SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements","authors":"Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva","doi":"arxiv-2408.02211","DOIUrl":"https://doi.org/arxiv-2408.02211","url":null,"abstract":"Despite advances in text-to-3D generation methods, generation of multi-object\u0000arrangements remains challenging. Current methods exhibit failures in\u0000generating physically plausible arrangements that respect the provided text\u0000description. We present SceneMotifCoder (SMC), an example-driven framework for\u0000generating 3D object arrangements through visual program learning. SMC\u0000leverages large language models (LLMs) and program synthesis to overcome these\u0000challenges by learning visual programs from example arrangements. These\u0000programs are generalized into compact, editable meta-programs. When combined\u0000with 3D object retrieval and geometry-aware optimization, they can be used to\u0000create object arrangements varying in arrangement structure and contained\u0000objects. Our experiments show that SMC generates high-quality arrangements\u0000using meta-programs learned from few examples. Evaluation results demonstrates\u0000that object arrangements generated by SMC better conform to user-specified text\u0000descriptions and are more physically plausible when compared with\u0000state-of-the-art text-to-3D generation and layout methods.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0