arXiv - CS - Graphics最新文献_第2页

Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video 高斯服装：从多视角视频中重建具有逼真外观的仿真服装

arXiv - CS - Graphics

Pub Date : 2024-09-12 DOI: arxiv-2409.08189

Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges

We introduce Gaussian Garments, a novel approach for reconstructing realisticsimulation-ready garment assets from multi-view videos. Our method representsgarments with a combination of a 3D mesh and a Gaussian texture that encodesboth the color and high-frequency surface details. This representation enablesaccurate registration of garment geometries to multi-view videos and helpsdisentangle albedo textures from lighting effects. Furthermore, we demonstratehow a pre-trained graph neural network (GNN) can be fine-tuned to replicate thereal behavior of each garment. The reconstructed Gaussian Garments can beautomatically combined into multi-garment outfits and animated with thefine-tuned GNN.

我们介绍了高斯服装，这是一种从多视角视频中重建可用于仿真的服装资产的新方法。我们的方法结合三维网格和高斯纹理来表示服装，高斯纹理对颜色和高频表面细节都进行了编码。这种表示方法能将服装几何图形准确地注册到多视角视频中，并有助于将反照率纹理与光照效果区分开来。此外，我们还演示了如何对预先训练好的图神经网络（GNN）进行微调，以复制每件服装的真实行为。重建后的高斯服装可以美观地组合成多件服装，并通过微调后的 GNN 制作动画。

引用次数: 0

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos StereoCrafter：从单目视频生成基于扩散的长尺寸高保真立体三维图像

arXiv - CS - Graphics

Pub Date : 2024-09-11 DOI: arxiv-2409.07447

Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan

This paper presents a novel framework for converting 2D videos to immersivestereoscopic 3D, addressing the growing demand for 3D content in immersiveexperience. Leveraging foundation models as priors, our approach overcomes thelimitations of traditional methods and boosts the performance to ensure thehigh-fidelity generation required by the display devices. The proposed systemconsists of two main steps: depth-based video splatting for warping andextracting occlusion mask, and stereo video inpainting. We utilize pre-trainedstable video diffusion as the backbone and introduce a fine-tuning protocol forthe stereo video inpainting task. To handle input video with varying lengthsand resolutions, we explore auto-regressive strategies and tiled processing.Finally, a sophisticated data processing pipeline has been developed toreconstruct a large-scale and high-quality dataset to support our training. Ourframework demonstrates significant improvements in 2D-to-3D video conversion,offering a practical solution for creating immersive content for 3D deviceslike Apple Vision Pro and 3D displays. In summary, this work contributes to thefield by presenting an effective method for generating high-qualitystereoscopic videos from monocular input, potentially transforming how weexperience digital media.

本文提出了一种将 2D 视频转换为沉浸式立体 3D 的新型框架，以满足沉浸式体验对 3D 内容日益增长的需求。利用基础模型作为先验，我们的方法克服了传统方法的局限性，并提高了性能，以确保显示设备所需的高保真生成。我们提出的系统包括两个主要步骤：基于深度的视频拼接（用于扭曲和提取遮挡）和立体视频内绘。我们利用预训练的稳定视频扩散作为骨干，并为立体视频绘制任务引入了微调协议。为了处理不同长度和分辨率的输入视频，我们探索了自动回归策略和平铺处理方法。最后，我们开发了一个复杂的数据处理管道，以重建一个大规模、高质量的数据集来支持我们的训练。我们的框架在 2D 到 3D 视频转换方面取得了重大改进，为苹果 Vision Pro 等 3D 设备和 3D 显示器创建身临其境的内容提供了实用的解决方案。总之，这项工作提出了一种从单眼输入生成高质量立体视频的有效方法，可能会改变我们体验数字媒体的方式，从而为该领域做出贡献。

{"title":"StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos","authors":"Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan","doi":"arxiv-2409.07447","DOIUrl":"https://doi.org/arxiv-2409.07447","url":null,"abstract":"This paper presents a novel framework for converting 2D videos to immersive\u0000stereoscopic 3D, addressing the growing demand for 3D content in immersive\u0000experience. Leveraging foundation models as priors, our approach overcomes the\u0000limitations of traditional methods and boosts the performance to ensure the\u0000high-fidelity generation required by the display devices. The proposed system\u0000consists of two main steps: depth-based video splatting for warping and\u0000extracting occlusion mask, and stereo video inpainting. We utilize pre-trained\u0000stable video diffusion as the backbone and introduce a fine-tuning protocol for\u0000the stereo video inpainting task. To handle input video with varying lengths\u0000and resolutions, we explore auto-regressive strategies and tiled processing.\u0000Finally, a sophisticated data processing pipeline has been developed to\u0000reconstruct a large-scale and high-quality dataset to support our training. Our\u0000framework demonstrates significant improvements in 2D-to-3D video conversion,\u0000offering a practical solution for creating immersive content for 3D devices\u0000like Apple Vision Pro and 3D displays. In summary, this work contributes to the\u0000field by presenting an effective method for generating high-quality\u0000stereoscopic videos from monocular input, potentially transforming how we\u0000experience digital media.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering 用于可重光和可交互面部渲染的即时面部高斯转换器

arXiv - CS - Graphics

Pub Date : 2024-09-11 DOI: arxiv-2409.07441

Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura

We propose GauFace, a novel Gaussian Splatting representation, tailored forefficient animation and rendering of physically-based facial assets. Leveragingstrong geometric priors and constrained optimization, GauFace ensures a neatand structured Gaussian representation, delivering high fidelity and real-timefacial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translatesphysically-based facial assets into the corresponding GauFace representations.Specifically, we adopt a patch-based pipeline to handle the vast number ofGaussians effectively. We also introduce a novel pixel-aligned sampling schemewith UV positional encoding to ensure the throughput and rendering quality ofGauFace assets generated by our TransGS. Once trained, TransGS can instantlytranslate facial assets with lighting conditions to GauFace representation,With the rich conditioning modalities, it also enables editing and animationcapabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditionaloffline and online renderers, as well as recent neural rendering methods, whichdemonstrate the superior performance of our approach for facial assetrendering. We also showcase diverse immersive applications of facial assetsusing our TransGS approach and GauFace representation, across various platformslike PCs, phones and even VR headsets.

我们提出的 GauFace 是一种新颖的高斯拼接表示法，专为基于物理的面部资产的高效动画和渲染而定制。利用强大的几何先验和约束优化，GauFace 可确保整齐且结构化的高斯表示，在骁龙 8 代 2 移动平台上实现 30fps@1440p 的高保真和实时面部交互。具体来说，我们采用基于补丁的流水线来有效处理大量高斯。我们还引入了新颖的像素对齐采样方案和 UV 位置编码，以确保 TransGS 生成的 GauFace 资产的吞吐量和渲染质量。经过培训后，TransGS 可以立即将光照条件下的面部资产转换为 GauFace 表征，并通过丰富的调节模式，实现与传统 CG 管线类似的编辑和动画功能。我们进行了广泛的评估和用户研究，与传统的离线和在线渲染器以及最新的神经渲染方法进行了比较，证明了我们的方法在面部资产渲染方面的卓越性能。我们还展示了使用我们的 TransGS 方法和 GauFace 表征的面部资产的各种沉浸式应用，这些应用跨越了各种平台，如 PC、手机甚至 VR 头显。

{"title":"Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering","authors":"Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura","doi":"arxiv-2409.07441","DOIUrl":"https://doi.org/arxiv-2409.07441","url":null,"abstract":"We propose GauFace, a novel Gaussian Splatting representation, tailored for\u0000efficient animation and rendering of physically-based facial assets. Leveraging\u0000strong geometric priors and constrained optimization, GauFace ensures a neat\u0000and structured Gaussian representation, delivering high fidelity and real-time\u0000facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates\u0000physically-based facial assets into the corresponding GauFace representations.\u0000Specifically, we adopt a patch-based pipeline to handle the vast number of\u0000Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme\u0000with UV positional encoding to ensure the throughput and rendering quality of\u0000GauFace assets generated by our TransGS. Once trained, TransGS can instantly\u0000translate facial assets with lighting conditions to GauFace representation,\u0000With the rich conditioning modalities, it also enables editing and animation\u0000capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional\u0000offline and online renderers, as well as recent neural rendering methods, which\u0000demonstrate the superior performance of our approach for facial asset\u0000rendering. We also showcase diverse immersive applications of facial assets\u0000using our TransGS approach and GauFace representation, across various platforms\u0000like PCs, phones and even VR headsets.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification MVGaussian：利用多视图引导和表面致密化技术生成高保真文本到三维内容

arXiv - CS - Graphics

Pub Date : 2024-09-10 DOI: arxiv-2409.06620

Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera

The field of text-to-3D content generation has made significant progress ingenerating realistic 3D objects, with existing methodologies like ScoreDistillation Sampling (SDS) offering promising guidance. However, these methodsoften encounter the "Janus" problem-multi-face ambiguities due to impreciseguidance. Additionally, while recent advancements in 3D gaussian splitting haveshown its efficacy in representing 3D volumes, optimization of thisrepresentation remains largely unexplored. This paper introduces a unifiedframework for text-to-3D content generation that addresses these critical gaps.Our approach utilizes multi-view guidance to iteratively form the structure ofthe 3D model, progressively enhancing detail and accuracy. We also introduce anovel densification algorithm that aligns gaussians close to the surface,optimizing the structural integrity and fidelity of the generated models.Extensive experiments validate our approach, demonstrating that it produceshigh-quality visual outputs with minimal time cost. Notably, our methodachieves high-quality results within half an hour of training, offering asubstantial efficiency gain over most existing methods, which require hours oftraining time to achieve comparable results.

文本到三维内容生成领域在生成逼真的三维对象方面取得了重大进展，现有的方法（如分数蒸馏采样（SDS））提供了很好的指导。然而，这些方法经常会遇到 "Janus "问题--由于引导不精确而导致多面性模糊。此外，虽然三维高斯分割的最新进展显示了其在表示三维体积方面的功效，但这种表示方法的优化在很大程度上仍未得到探索。我们的方法利用多视角引导迭代形成三维模型结构，逐步增强细节和准确性。我们的方法利用多视角引导逐步形成三维模型的结构，逐步增强细节和准确性。我们还引入了一种高级致密化算法，可将高斯对齐到表面附近，优化生成模型的结构完整性和保真度。值得注意的是，我们的方法在半小时的训练时间内就能获得高质量的结果，与大多数现有方法相比，效率大幅提高，因为现有方法需要数小时的训练时间才能获得类似的结果。

{"title":"MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification","authors":"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera","doi":"arxiv-2409.06620","DOIUrl":"https://doi.org/arxiv-2409.06620","url":null,"abstract":"The field of text-to-3D content generation has made significant progress in\u0000generating realistic 3D objects, with existing methodologies like Score\u0000Distillation Sampling (SDS) offering promising guidance. However, these methods\u0000often encounter the \"Janus\" problem-multi-face ambiguities due to imprecise\u0000guidance. Additionally, while recent advancements in 3D gaussian splitting have\u0000shown its efficacy in representing 3D volumes, optimization of this\u0000representation remains largely unexplored. This paper introduces a unified\u0000framework for text-to-3D content generation that addresses these critical gaps.\u0000Our approach utilizes multi-view guidance to iteratively form the structure of\u0000the 3D model, progressively enhancing detail and accuracy. We also introduce a\u0000novel densification algorithm that aligns gaussians close to the surface,\u0000optimizing the structural integrity and fidelity of the generated models.\u0000Extensive experiments validate our approach, demonstrating that it produces\u0000high-quality visual outputs with minimal time cost. Notably, our method\u0000achieves high-quality results within half an hour of training, offering a\u0000substantial efficiency gain over most existing methods, which require hours of\u0000training time to achieve comparable results.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fiber-level Woven Fabric Capture from a Single Photo 从一张照片捕捉纤维级编织物

arXiv - CS - Graphics

Pub Date : 2024-09-10 DOI: arxiv-2409.06368

Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang

Accurately rendering the appearance of fabrics is challenging, due to theircomplex 3D microstructures and specialized optical properties. If we model thegeometry and optics of fabrics down to the fiber level, we can achieveunprecedented rendering realism, but this raises the difficulty of authoring orcapturing the fiber-level assets. Existing approaches can obtain fiber-levelgeometry with special devices (e.g., CT) or complex hand-designed proceduralpipelines (manually tweaking a set of parameters). In this paper, we propose aunified framework to capture fiber-level geometry and appearance of wovenfabrics using a single low-cost microscope image. We first use a simple neuralnetwork to predict initial parameters of our geometric and appearance models.From this starting point, we further optimize the parameters of proceduralfiber geometry and an approximated shading model via differentiablerasterization to match the microscope photo more accurately. Finally, we refinethe fiber appearance parameters via differentiable path tracing, converging toaccurate fiber optical parameters, which are suitable for physically-basedlight simulations to produce high-quality rendered results. We believe that ourmethod is the first to utilize differentiable rendering at the microscopiclevel, supporting physically-based scattering from explicit fiber assemblies.Our fabric parameter estimation achieves high-quality re-rendering of measuredwoven fabric samples in both distant and close-up views. These results canfurther be used for efficient rendering or converted to downstreamrepresentations. We also propose a patch-space fiber geometry proceduralgeneration and a two-scale path tracing framework for efficient rendering offabric scenes.

由于织物具有复杂的三维微观结构和特殊的光学特性，因此准确渲染织物的外观极具挑战性。如果我们将织物的几何和光学建模细化到纤维级，就能实现前所未有的渲染逼真度，但这也增加了编写或获取纤维级资产的难度。现有方法可以通过特殊设备（如 CT）或复杂的手工设计程序管道（手动调整一组参数）获得纤维级几何图形。在本文中，我们提出了一个统一的框架，利用单张低成本显微镜图像捕捉编织物的纤维级几何形状和外观。我们首先使用一个简单的神经网络来预测几何模型和外观模型的初始参数。从这个起点出发，我们进一步优化了程序化纤维几何模型和近似阴影模型的参数，通过差分光栅化来更精确地匹配显微镜照片。最后，我们通过可微分路径追踪来完善光纤外观参数，最终得到精确的光纤光学参数，这些参数适用于基于物理的光模拟，从而产生高质量的渲染结果。我们相信，我们的方法是第一个在显微镜级别利用可微分渲染的方法，支持基于物理的显式纤维组件散射。我们的织物参数估计实现了远景和近景测量织物样本的高质量重新渲染。这些结果可进一步用于高效渲染或转换为下游表示。我们还提出了补丁空间纤维几何程序生成和双尺度路径追踪框架，用于高效渲染织物场景。

{"title":"Fiber-level Woven Fabric Capture from a Single Photo","authors":"Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang","doi":"arxiv-2409.06368","DOIUrl":"https://doi.org/arxiv-2409.06368","url":null,"abstract":"Accurately rendering the appearance of fabrics is challenging, due to their\u0000complex 3D microstructures and specialized optical properties. If we model the\u0000geometry and optics of fabrics down to the fiber level, we can achieve\u0000unprecedented rendering realism, but this raises the difficulty of authoring or\u0000capturing the fiber-level assets. Existing approaches can obtain fiber-level\u0000geometry with special devices (e.g., CT) or complex hand-designed procedural\u0000pipelines (manually tweaking a set of parameters). In this paper, we propose a\u0000unified framework to capture fiber-level geometry and appearance of woven\u0000fabrics using a single low-cost microscope image. We first use a simple neural\u0000network to predict initial parameters of our geometric and appearance models.\u0000From this starting point, we further optimize the parameters of procedural\u0000fiber geometry and an approximated shading model via differentiable\u0000rasterization to match the microscope photo more accurately. Finally, we refine\u0000the fiber appearance parameters via differentiable path tracing, converging to\u0000accurate fiber optical parameters, which are suitable for physically-based\u0000light simulations to produce high-quality rendered results. We believe that our\u0000method is the first to utilize differentiable rendering at the microscopic\u0000level, supporting physically-based scattering from explicit fiber assemblies.\u0000Our fabric parameter estimation achieves high-quality re-rendering of measured\u0000woven fabric samples in both distant and close-up views. These results can\u0000further be used for efficient rendering or converted to downstream\u0000representations. We also propose a patch-space fiber geometry procedural\u0000generation and a two-scale path tracing framework for efficient rendering of\u0000fabric scenes.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image Vectorization with Depth: convexified shape layers with depth ordering 带深度的图像矢量化：带深度排序的凸形图层

arXiv - CS - Graphics

Pub Date : 2024-09-10 DOI: arxiv-2409.06648

Ho Law, Sung Ha Kang

Image vectorization is a process to convert a raster image into a scalablevector graphic format. Objective is to effectively remove the pixelizationeffect while representing boundaries of image by scaleable parameterizedcurves. We propose new image vectorization with depth which considers depthordering among shapes and use curvature-based inpainting for convexifyingshapes in vectorization process.From a given color quantized raster image, wefirst define each connected component of the same color as a shape layer, andconstruct depth ordering among them using a newly proposed depth orderingenergy. Global depth ordering among all shapes is described by a directedgraph, and we propose an energy to remove cycle within the graph. Afterconstructing depth ordering of shapes, we convexify occluded regions by Euler'selastica curvature-based variational inpainting, and leverage on the stabilityof Modica-Mortola double-well potential energy to inpaint large regions. Thisis following human vision perception that boundaries of shapes extend smoothly,and we assume shapes are likely to be convex. Finally, we fit B'{e}zier curvesto the boundaries and save vectorization as a SVG file which allowssuperposition of curvature-based inpainted shapes following the depth ordering.This is a new way to vectorize images, by decomposing an image into scalableshape layers with computed depth ordering. This approach makes editing shapesand images more natural and intuitive. We also consider grouping shape layersfor semantic vectorization. We present various numerical results andcomparisons against recent layer-based vectorization methods to validate theproposed model.

图像矢量化是将光栅图像转换为可缩放矢量图形格式的过程。其目的是有效消除像素化效应，同时用可缩放的参数化曲线表示图像的边界。我们提出了新的深度图像矢量化方法，它考虑了形状之间的深度排序，并在矢量化过程中使用基于曲率的内绘来凸化形状。从给定的彩色量化光栅图像中，我们首先将相同颜色的每个连接分量定义为一个形状层，并使用新提出的深度排序能量来构建它们之间的深度排序。所有形状的全局深度排序由一个有向图描述，我们提出了一种能量来消除图中的循环。在构建了形状的深度排序后，我们通过基于欧拉曲率的变分涂色来凸显遮挡区域，并利用莫迪卡-莫托拉双井势能的稳定性来涂色大面积区域。这符合人类视觉对形状边界平滑延伸的感知，而且我们假设形状很可能是凸形的。最后，我们将 B'{e}zier 曲线拟合到边界上，并将矢量化保存为 SVG 文件，这样就可以按照深度排序对基于曲率的内绘形状进行叠加。这种方法使形状和图像的编辑更加自然和直观。我们还考虑对形状层进行分组，以实现语义矢量化。我们展示了各种数值结果，并与近期基于图层的矢量化方法进行了比较，以验证所提出的模型。

{"title":"Image Vectorization with Depth: convexified shape layers with depth ordering","authors":"Ho Law, Sung Ha Kang","doi":"arxiv-2409.06648","DOIUrl":"https://doi.org/arxiv-2409.06648","url":null,"abstract":"Image vectorization is a process to convert a raster image into a scalable\u0000vector graphic format. Objective is to effectively remove the pixelization\u0000effect while representing boundaries of image by scaleable parameterized\u0000curves. We propose new image vectorization with depth which considers depth\u0000ordering among shapes and use curvature-based inpainting for convexifying\u0000shapes in vectorization process.From a given color quantized raster image, we\u0000first define each connected component of the same color as a shape layer, and\u0000construct depth ordering among them using a newly proposed depth ordering\u0000energy. Global depth ordering among all shapes is described by a directed\u0000graph, and we propose an energy to remove cycle within the graph. After\u0000constructing depth ordering of shapes, we convexify occluded regions by Euler's\u0000elastica curvature-based variational inpainting, and leverage on the stability\u0000of Modica-Mortola double-well potential energy to inpaint large regions. This\u0000is following human vision perception that boundaries of shapes extend smoothly,\u0000and we assume shapes are likely to be convex. Finally, we fit B'{e}zier curves\u0000to the boundaries and save vectorization as a SVG file which allows\u0000superposition of curvature-based inpainted shapes following the depth ordering.\u0000This is a new way to vectorize images, by decomposing an image into scalable\u0000shape layers with computed depth ordering. This approach makes editing shapes\u0000and images more natural and intuitive. We also consider grouping shape layers\u0000for semantic vectorization. We present various numerical results and\u0000comparisons against recent layer-based vectorization methods to validate the\u0000proposed model.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural Laplacian Operator for 3D Point Clouds 用于三维点云的神经拉普拉斯算子

arXiv - CS - Graphics

Pub Date : 2024-09-10 DOI: arxiv-2409.06506

Bo Pang, Zhongtian Zheng, Yilong Li, Guoping Wang, Peng-Shuai Wang

The discrete Laplacian operator holds a crucial role in 3D geometryprocessing, yet it is still challenging to define it on point clouds. Previousworks mainly focused on constructing a local triangulation around each point toapproximate the underlying manifold for defining the Laplacian operator, whichmay not be robust or accurate. In contrast, we simply use the K-nearestneighbors (KNN) graph constructed from the input point cloud and learn theLaplacian operator on the KNN graph with graph neural networks (GNNs). However,the ground-truth Laplacian operator is defined on a manifold mesh with adifferent connectivity from the KNN graph and thus cannot be directly used fortraining. To train the GNN, we propose a novel training scheme by imitating thebehavior of the ground-truth Laplacian operator on a set of probe functions sothat the learned Laplacian operator behaves similarly to the ground-truthLaplacian operator. We train our network on a subset of ShapeNet and evaluateit across a variety of point clouds. Compared with previous methods, our methodreduces the error by an order of magnitude and excels in handling sparse pointclouds with thin structures or sharp features. Our method also demonstrates astrong generalization ability to unseen shapes. With our learned Laplacianoperator, we further apply a series of Laplacian-based geometry processingalgorithms directly to point clouds and achieve accurate results, enabling manyexciting possibilities for geometry processing on point clouds. The code andtrained models are available at https://github.com/IntelligentGeometry/NeLo.

离散拉普拉斯算子在三维几何处理中起着至关重要的作用，但在点云上定义离散拉普拉斯算子仍然具有挑战性。以前的工作主要集中在构建每个点周围的局部三角剖分，以接近底层流形来定义拉普拉斯算子，但这可能并不稳健或准确。相比之下，我们只需使用由输入点云构建的 K-最近邻（KNN）图，并利用图神经网络（GNN）在 KNN 图上学习拉普拉斯算子。然而，真实的拉普拉斯算子是在流形网格上定义的，其连接性与 KNN 图不同，因此不能直接用于训练。为了训练 GNN，我们提出了一种新的训练方案，即在一组探测函数上模仿地面实况拉普拉斯算子的行为，使学习到的拉普拉斯算子与地面实况拉普拉斯算子的行为相似。我们在 ShapeNet 的一个子集上训练我们的网络，并通过各种点云对其进行评估。与之前的方法相比，我们的方法将误差降低了一个数量级，并且在处理具有稀疏结构或尖锐特征的稀疏点云方面表现出色。我们的方法还展示了对未知形状的强大泛化能力。利用我们学习的拉普拉斯操作器，我们进一步将一系列基于拉普拉斯的几何处理算法直接应用于点云，并取得了精确的结果，为点云的几何处理带来了许多令人兴奋的可能性。代码和训练模型可在 https://github.com/IntelligentGeometry/NeLo 上获取。

{"title":"Neural Laplacian Operator for 3D Point Clouds","authors":"Bo Pang, Zhongtian Zheng, Yilong Li, Guoping Wang, Peng-Shuai Wang","doi":"arxiv-2409.06506","DOIUrl":"https://doi.org/arxiv-2409.06506","url":null,"abstract":"The discrete Laplacian operator holds a crucial role in 3D geometry\u0000processing, yet it is still challenging to define it on point clouds. Previous\u0000works mainly focused on constructing a local triangulation around each point to\u0000approximate the underlying manifold for defining the Laplacian operator, which\u0000may not be robust or accurate. In contrast, we simply use the K-nearest\u0000neighbors (KNN) graph constructed from the input point cloud and learn the\u0000Laplacian operator on the KNN graph with graph neural networks (GNNs). However,\u0000the ground-truth Laplacian operator is defined on a manifold mesh with a\u0000different connectivity from the KNN graph and thus cannot be directly used for\u0000training. To train the GNN, we propose a novel training scheme by imitating the\u0000behavior of the ground-truth Laplacian operator on a set of probe functions so\u0000that the learned Laplacian operator behaves similarly to the ground-truth\u0000Laplacian operator. We train our network on a subset of ShapeNet and evaluate\u0000it across a variety of point clouds. Compared with previous methods, our method\u0000reduces the error by an order of magnitude and excels in handling sparse point\u0000clouds with thin structures or sharp features. Our method also demonstrates a\u0000strong generalization ability to unseen shapes. With our learned Laplacian\u0000operator, we further apply a series of Laplacian-based geometry processing\u0000algorithms directly to point clouds and achieve accurate results, enabling many\u0000exciting possibilities for geometry processing on point clouds. The code and\u0000trained models are available at https://github.com/IntelligentGeometry/NeLo.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement DECOLLAGE：通过可控、定位和学习的几何增强技术实现 3D 细节化

arXiv - CS - Graphics

Pub Date : 2024-09-10 DOI: arxiv-2409.06129

Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri

We present a 3D modeling method which enables end-users to refine ordetailize 3D shapes using machine learning, expanding the capabilities ofAI-assisted 3D content creation. Given a coarse voxel shape (e.g., one producedwith a simple box extrusion tool or via generative modeling), a user candirectly "paint" desired target styles representing compelling geometricdetails, from input exemplar shapes, over different regions of the coarseshape. These regions are then up-sampled into high-resolution geometries whichadhere with the painted styles. To achieve such controllable and localized 3Ddetailization, we build on top of a Pyramid GAN by making it masking-aware. Wedevise novel structural losses and priors to ensure that our method preservesboth desired coarse structures and fine-grained features even if the paintedstyles are borrowed from diverse sources, e.g., different semantic parts andeven different shape categories. Through extensive experiments, we show thatour ability to localize details enables novel interactive creative workflowsand applications. Our experiments further demonstrate that in comparison toprior techniques built on global detailization, our method generatesstructure-preserving, high-resolution stylized geometries with more coherentshape details and style transitions.

我们介绍了一种三维建模方法，它能让最终用户利用机器学习对三维形状进行细化，从而扩展人工智能辅助三维内容创建的能力。给定一个粗体素形状（例如，用简单的方框挤出工具或通过生成建模生成的形状），用户可以直接在粗形状的不同区域 "绘制 "所需的目标样式，这些样式来自输入的示例形状，代表了引人注目的几何细节。然后，这些区域会被上采样成高分辨率的几何图形，并与绘制的样式保持一致。为了实现这种可控的局部 3D 细节化，我们在金字塔 GAN 的基础上对其进行了遮罩感知。我们设计了新颖的结构损失和先验，以确保我们的方法既能保留所需的粗略结构，又能保留细粒度特征，即使绘制的样式来自不同的来源，例如不同的语义部分，甚至不同的形状类别。通过大量实验，我们证明了我们的细节定位能力能够实现新颖的交互式创意工作流程和应用。我们的实验进一步证明，与基于全局细节化的其他技术相比，我们的方法能生成保留结构的高分辨率风格化几何图形，而且形状细节和风格过渡更加连贯。

{"title":"DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement","authors":"Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri","doi":"arxiv-2409.06129","DOIUrl":"https://doi.org/arxiv-2409.06129","url":null,"abstract":"We present a 3D modeling method which enables end-users to refine or\u0000detailize 3D shapes using machine learning, expanding the capabilities of\u0000AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced\u0000with a simple box extrusion tool or via generative modeling), a user can\u0000directly \"paint\" desired target styles representing compelling geometric\u0000details, from input exemplar shapes, over different regions of the coarse\u0000shape. These regions are then up-sampled into high-resolution geometries which\u0000adhere with the painted styles. To achieve such controllable and localized 3D\u0000detailization, we build on top of a Pyramid GAN by making it masking-aware. We\u0000devise novel structural losses and priors to ensure that our method preserves\u0000both desired coarse structures and fine-grained features even if the painted\u0000styles are borrowed from diverse sources, e.g., different semantic parts and\u0000even different shape categories. Through extensive experiments, we show that\u0000our ability to localize details enables novel interactive creative workflows\u0000and applications. Our experiments further demonstrate that in comparison to\u0000prior techniques built on global detailization, our method generates\u0000structure-preserving, high-resolution stylized geometries with more coherent\u0000shape details and style transitions.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"183 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-scale Cycle Tracking in Dynamic Planar Graphs 动态平面图中的多尺度周期跟踪

arXiv - CS - Graphics

Pub Date : 2024-09-10 DOI: arxiv-2409.06476

Farhan Rasheed, Abrar Naseer, Emma Nilsson, Talha Bin Masood, Ingrid Hotz

This paper presents a nested tracking framework for analyzing cycles in 2Dforce networks within granular materials. These materials are composed ofinteracting particles, whose interactions are described by a force network.Understanding the cycles within these networks at various scales and theirevolution under external loads is crucial, as they significantly contribute tothe mechanical and kinematic properties of the system. Our approach involvescomputing a cycle hierarchy by partitioning the 2D domain into segments boundedby cycles in the force network. We can adapt concepts from nested trackinggraphs originally developed for merge trees by leveraging the duality betweenthis partitioning and the cycles. We demonstrate the effectiveness of ourmethod on two force networks derived from experiments with photoelastic disks.

本文提出了一个嵌套跟踪框架，用于分析颗粒材料中二维力网络的循环。这些材料由相互作用的颗粒组成，其相互作用由力网络描述。理解这些网络中不同尺度的循环以及在外部载荷作用下的演变至关重要，因为它们对系统的机械和运动特性有重大贡献。我们的方法包括通过将二维域划分为以力网络中的循环为边界的段来计算循环层次。我们可以利用这种划分和循环之间的二元性，对最初为合并树开发的嵌套跟踪图概念进行调整。我们在光弹性盘实验中得到的两个力网络上演示了我们方法的有效性。

引用次数: 0

PersonaTalk: Bring Attention to Your Persona in Visual Dubbing PersonaTalk：在视觉配音中关注您的角色

arXiv - CS - Graphics

Pub Date : 2024-09-09 DOI: arxiv-2409.05379

Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu

For audio-driven visual dubbing, it remains a considerable challenge touphold and highlight speaker's persona while synthesizing accurate lipsynchronization. Existing methods fall short of capturing speaker's uniquespeaking style or preserving facial details. In this paper, we presentPersonaTalk, an attention-based two-stage framework, including geometryconstruction and face rendering, for high-fidelity and personalized visualdubbing. In the first stage, we propose a style-aware audio encoding modulethat injects speaking style into audio features through a cross-attentionlayer. The stylized audio features are then used to drive speaker's templategeometry to obtain lip-synced geometries. In the second stage, a dual-attentionface renderer is introduced to render textures for the target geometries. Itconsists of two parallel cross-attention layers, namely Lip-Attention andFace-Attention, which respectively sample textures from different referenceframes to render the entire face. With our innovative design, intricate facialdetails can be well preserved. Comprehensive experiments and user studiesdemonstrate our advantages over other state-of-the-art methods in terms ofvisual quality, lip-sync accuracy and persona preservation. Furthermore, as aperson-generic framework, PersonaTalk can achieve competitive performance asstate-of-the-art person-specific methods. Project Page:https://grisoon.github.io/PersonaTalk/.

对于音频驱动的视觉配音来说，在合成准确的唇语同步的同时，如何体现和突出说话者的个性仍然是一个相当大的挑战。现有的方法无法捕捉说话者独特的说话风格或保留面部细节。在本文中，我们提出了一个基于注意力的两阶段框架--PersonaTalk，包括几何构建和面部渲染，用于高保真和个性化的视觉配音。在第一阶段，我们提出了一种风格感知音频编码模块，通过跨注意力层将说话风格注入音频特征。然后，风格化的音频特征被用于驱动说话者的模板几何，以获得唇语同步几何。在第二阶段，引入双注意面渲染器来渲染目标几何图形的纹理。它由两个并行的交叉注意力层（即唇部注意力层和面部注意力层）组成，分别从不同的参考帧中采样纹理，以渲染整个面部。通过我们的创新设计，可以很好地保留复杂的面部细节。全面的实验和用户研究表明，我们在视觉质量、唇音同步准确性和人物形象保存方面都优于其他最先进的方法。此外，作为一种人形通用框架，PersonaTalk 的性能可与最先进的人形特定方法相媲美。项目页面:https://grisoon.github.io/PersonaTalk/.

{"title":"PersonaTalk: Bring Attention to Your Persona in Visual Dubbing","authors":"Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu","doi":"arxiv-2409.05379","DOIUrl":"https://doi.org/arxiv-2409.05379","url":null,"abstract":"For audio-driven visual dubbing, it remains a considerable challenge to\u0000uphold and highlight speaker's persona while synthesizing accurate lip\u0000synchronization. Existing methods fall short of capturing speaker's unique\u0000speaking style or preserving facial details. In this paper, we present\u0000PersonaTalk, an attention-based two-stage framework, including geometry\u0000construction and face rendering, for high-fidelity and personalized visual\u0000dubbing. In the first stage, we propose a style-aware audio encoding module\u0000that injects speaking style into audio features through a cross-attention\u0000layer. The stylized audio features are then used to drive speaker's template\u0000geometry to obtain lip-synced geometries. In the second stage, a dual-attention\u0000face renderer is introduced to render textures for the target geometries. It\u0000consists of two parallel cross-attention layers, namely Lip-Attention and\u0000Face-Attention, which respectively sample textures from different reference\u0000frames to render the entire face. With our innovative design, intricate facial\u0000details can be well preserved. Comprehensive experiments and user studies\u0000demonstrate our advantages over other state-of-the-art methods in terms of\u0000visual quality, lip-sync accuracy and persona preservation. Furthermore, as a\u0000person-generic framework, PersonaTalk can achieve competitive performance as\u0000state-of-the-art person-specific methods. Project Page:\u0000https://grisoon.github.io/PersonaTalk/.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0