arXiv - CS - Graphics最新文献

GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations 高斯头像：从粗到细的表征中端到端学习可驾驶的高斯头像

arXiv - CS - Graphics

Pub Date : 2024-09-18 DOI: arxiv-2409.11951

Kartik Teotia, Hyeongwoo Kim, Pablo Garrido, Marc Habermann, Mohamed Elgharib, Christian Theobalt

Real-time rendering of human head avatars is a cornerstone of many computergraphics applications, such as augmented reality, video games, and films, toname a few. Recent approaches address this challenge with computationallyefficient geometry primitives in a carefully calibrated multi-view setup.Albeit producing photorealistic head renderings, it often fails to representcomplex motion changes such as the mouth interior and strongly varying headposes. We propose a new method to generate highly dynamic and deformable humanhead avatars from multi-view imagery in real-time. At the core of our method isa hierarchical representation of head models that allows to capture the complexdynamics of facial expressions and head movements. First, with rich facialfeatures extracted from raw input frames, we learn to deform the coarse facialgeometry of the template mesh. We then initialize 3D Gaussians on the deformedsurface and refine their positions in a fine step. We train this coarse-to-finefacial avatar model along with the head pose as a learnable parameter in anend-to-end framework. This enables not only controllable facial animation viavideo inputs, but also high-fidelity novel view synthesis of challenging facialexpressions, such as tongue deformations and fine-grained teeth structure underlarge motion changes. Moreover, it encourages the learned head avatar togeneralize towards new facial expressions and head poses at inference time. Wedemonstrate the performance of our method with comparisons against the relatedmethods on different datasets, spanning challenging facial expression sequencesacross multiple identities. We also show the potential application of ourapproach by demonstrating a cross-identity facial performance transferapplication.

人类头像的实时渲染是增强现实、视频游戏和电影等许多计算机图形应用的基石。尽管可以生成逼真的头部渲染效果，但它往往无法表现复杂的运动变化，如嘴巴内部和头部姿势的强烈变化。我们提出了一种从多视角图像中实时生成高动态和可变形人头像的新方法。我们方法的核心是对头部模型进行分层表示，从而捕捉面部表情和头部运动的复杂动态。首先，利用从原始输入帧中提取的丰富面部特征，我们学习模板网格的粗面部几何变形。然后，我们在变形表面上初始化三维高斯，并在精细步骤中细化它们的位置。在端到端框架中，我们将头部姿势作为可学习的参数，训练这个从粗到细的面部头像模型。这不仅能通过视频输入实现可控的面部动画，还能对具有挑战性的面部表情进行高保真的新视图合成，例如在大运动变化下的舌头变形和细粒度牙齿结构。此外，它还鼓励学习到的头部头像在推理时泛化为新的面部表情和头部姿势。我们通过在不同数据集上与相关方法的比较，展示了我们方法的性能，这些数据集跨越了多种身份的具有挑战性的面部表情序列。我们还通过演示跨身份面部表情转移应用，展示了我们方法的潜在应用价值。

{"title":"GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations","authors":"Kartik Teotia, Hyeongwoo Kim, Pablo Garrido, Marc Habermann, Mohamed Elgharib, Christian Theobalt","doi":"arxiv-2409.11951","DOIUrl":"https://doi.org/arxiv-2409.11951","url":null,"abstract":"Real-time rendering of human head avatars is a cornerstone of many computer\u0000graphics applications, such as augmented reality, video games, and films, to\u0000name a few. Recent approaches address this challenge with computationally\u0000efficient geometry primitives in a carefully calibrated multi-view setup.\u0000Albeit producing photorealistic head renderings, it often fails to represent\u0000complex motion changes such as the mouth interior and strongly varying head\u0000poses. We propose a new method to generate highly dynamic and deformable human\u0000head avatars from multi-view imagery in real-time. At the core of our method is\u0000a hierarchical representation of head models that allows to capture the complex\u0000dynamics of facial expressions and head movements. First, with rich facial\u0000features extracted from raw input frames, we learn to deform the coarse facial\u0000geometry of the template mesh. We then initialize 3D Gaussians on the deformed\u0000surface and refine their positions in a fine step. We train this coarse-to-fine\u0000facial avatar model along with the head pose as a learnable parameter in an\u0000end-to-end framework. This enables not only controllable facial animation via\u0000video inputs, but also high-fidelity novel view synthesis of challenging facial\u0000expressions, such as tongue deformations and fine-grained teeth structure under\u0000large motion changes. Moreover, it encourages the learned head avatar to\u0000generalize towards new facial expressions and head poses at inference time. We\u0000demonstrate the performance of our method with comparisons against the related\u0000methods on different datasets, spanning challenging facial expression sequences\u0000across multiple identities. We also show the potential application of our\u0000approach by demonstrating a cross-identity facial performance transfer\u0000application.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Missing Data Imputation GAN for Character Sprite Generation 用于角色精灵生成的缺失数据推算广义运算模型

arXiv - CS - Graphics

Pub Date : 2024-09-16 DOI: arxiv-2409.10721

Flávio Coutinho, Luiz Chaimowicz

Creating and updating pixel art character sprites with many frames spanningdifferent animations and poses takes time and can quickly become repetitive.However, that can be partially automated to allow artists to focus on morecreative tasks. In this work, we concentrate on creating pixel art charactersprites in a target pose from images of them facing other three directions. Wepresent a novel approach to character generation by framing the problem as amissing data imputation task. Our proposed generative adversarial networksmodel receives the images of a character in all available domains and producesthe image of the missing pose. We evaluated our approach in the scenarios withone, two, and three missing images, achieving similar or better results to thestate-of-the-art when more images are available. We also evaluate the impact ofthe proposed changes to the base architecture.

创建和更新像素艺术角色精灵需要花费大量时间，而且很快就会变得重复。然而，这可以部分自动化，以便让艺术家专注于更具创造性的任务。在这项工作中，我们专注于从面向其他三个方向的图像中创建目标姿势的像素艺术角色。我们提出了一种新颖的角色生成方法，将该问题视为错误数据估算任务。我们提出的生成对抗网络模型接收所有可用域中的角色图像，并生成缺失姿势的图像。我们在只有一张、两张和三张缺失图像的情况下对我们的方法进行了评估，当有更多图像可用时，我们的方法取得了与最新技术相似或更好的结果。我们还评估了对基础架构的建议更改所产生的影响。

引用次数: 0

Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering Phys3DGS：基于物理的三维高斯拼接反渲染技术

arXiv - CS - Graphics

Pub Date : 2024-09-16 DOI: arxiv-2409.10335

Euntae Choi, Sungjoo Yoo

We propose two novel ideas (adoption of deferred rendering and mesh-basedrepresentation) to improve the quality of 3D Gaussian splatting (3DGS) basedinverse rendering. We first report a problem incurred by hidden Gaussians,where Gaussians beneath the surface adversely affect the pixel color in thevolume rendering adopted by the existing methods. In order to resolve theproblem, we propose applying deferred rendering and report new problemsincurred in a naive application of deferred rendering to the existing3DGS-based inverse rendering. In an effort to improve the quality of 3DGS-basedinverse rendering under deferred rendering, we propose a novel two-steptraining approach which (1) exploits mesh extraction and utilizes a hybridmesh-3DGS representation and (2) applies novel regularization methods to betterexploit the mesh. Our experiments show that, under relighting, the proposedmethod offers significantly better rendering quality than the existing3DGS-based inverse rendering methods. Compared with the SOTA voxel grid-basedinverse rendering method, it gives better rendering quality while offeringreal-time rendering.

我们提出了两个新想法（采用延迟渲染和基于网格的表示）来提高基于三维高斯拼接（3DGS）的反向渲染质量。我们首先报告了隐藏高斯带来的一个问题，即在现有方法采用的体积渲染中，表面下的高斯会对像素颜色产生不利影响。为了解决这个问题，我们建议应用延迟渲染，并报告了在现有的基于 3DGS 的反渲染中应用延迟渲染所产生的新问题。为了提高延迟渲染下基于 3DGS 的反渲染质量，我们提出了一种新颖的两步训练方法：（1）利用网格提取并使用混合网格-3DGS 表示法；（2）应用新颖的正则化方法来更好地利用网格。我们的实验表明，在重新照明条件下，所提出的方法比现有的基于 3DGS 的反渲染方法具有更好的渲染质量。与基于体素网格的 SOTA 逆渲染方法相比，它在提供实时渲染的同时，还能提供更好的渲染质量。

{"title":"Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering","authors":"Euntae Choi, Sungjoo Yoo","doi":"arxiv-2409.10335","DOIUrl":"https://doi.org/arxiv-2409.10335","url":null,"abstract":"We propose two novel ideas (adoption of deferred rendering and mesh-based\u0000representation) to improve the quality of 3D Gaussian splatting (3DGS) based\u0000inverse rendering. We first report a problem incurred by hidden Gaussians,\u0000where Gaussians beneath the surface adversely affect the pixel color in the\u0000volume rendering adopted by the existing methods. In order to resolve the\u0000problem, we propose applying deferred rendering and report new problems\u0000incurred in a naive application of deferred rendering to the existing\u00003DGS-based inverse rendering. In an effort to improve the quality of 3DGS-based\u0000inverse rendering under deferred rendering, we propose a novel two-step\u0000training approach which (1) exploits mesh extraction and utilizes a hybrid\u0000mesh-3DGS representation and (2) applies novel regularization methods to better\u0000exploit the mesh. Our experiments show that, under relighting, the proposed\u0000method offers significantly better rendering quality than the existing\u00003DGS-based inverse rendering methods. Compared with the SOTA voxel grid-based\u0000inverse rendering method, it gives better rendering quality while offering\u0000real-time rendering.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Playground v3：利用深度融合大型语言模型改进文本到图像的对齐方式

arXiv - CS - Graphics

Pub Date : 2024-09-16 DOI: arxiv-2409.10695

Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li

We introduce Playground v3 (PGv3), our latest text-to-image model thatachieves state-of-the-art (SoTA) performance across multiple testingbenchmarks, excels in graphic design abilities and introduces new capabilities.Unlike traditional text-to-image generative models that rely on pre-trainedlanguage models like T5 or CLIP text encoders, our approach fully integratesLarge Language Models (LLMs) with a novel structure that leverages textconditions exclusively from a decoder-only LLM. Additionally, to enhance imagecaptioning quality-we developed an in-house captioner, capable of generatingcaptions with varying levels of detail, enriching the diversity of textstructures. We also introduce a new benchmark CapsBench to evaluate detailedimage captioning performance. Experimental results demonstrate that PGv3 excelsin text prompt adherence, complex reasoning, and accurate text rendering. Userpreference studies indicate the super-human graphic design ability of our modelfor common design applications, such as stickers, posters, and logo designs.Furthermore, PGv3 introduces new capabilities, including precise RGB colorcontrol and robust multilingual understanding.

与依赖 T5 或 CLIP 文本编码器等预训练语言模型的传统文本到图像生成模型不同，我们的方法完全集成了大型语言模型 (LLM)，并采用了一种新颖的结构，完全利用解码器专用 LLM 中的文本条件。此外，为了提高图像字幕质量，我们开发了一款内部字幕机，能够生成不同详细程度的字幕，丰富了文本结构的多样性。我们还引入了一个新的基准 CapsBench 来评估详细图像字幕的性能。实验结果表明，PGv3 在文本提示、复杂推理和准确文本渲染方面表现出色。此外，PGv3 还引入了新功能，包括精确的 RGB 颜色控制和强大的多语言理解能力。

{"title":"Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models","authors":"Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li","doi":"arxiv-2409.10695","DOIUrl":"https://doi.org/arxiv-2409.10695","url":null,"abstract":"We introduce Playground v3 (PGv3), our latest text-to-image model that\u0000achieves state-of-the-art (SoTA) performance across multiple testing\u0000benchmarks, excels in graphic design abilities and introduces new capabilities.\u0000Unlike traditional text-to-image generative models that rely on pre-trained\u0000language models like T5 or CLIP text encoders, our approach fully integrates\u0000Large Language Models (LLMs) with a novel structure that leverages text\u0000conditions exclusively from a decoder-only LLM. Additionally, to enhance image\u0000captioning quality-we developed an in-house captioner, capable of generating\u0000captions with varying levels of detail, enriching the diversity of text\u0000structures. We also introduce a new benchmark CapsBench to evaluate detailed\u0000image captioning performance. Experimental results demonstrate that PGv3 excels\u0000in text prompt adherence, complex reasoning, and accurate text rendering. User\u0000preference studies indicate the super-human graphic design ability of our model\u0000for common design applications, such as stickers, posters, and logo designs.\u0000Furthermore, PGv3 introduces new capabilities, including precise RGB color\u0000control and robust multilingual understanding.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"99 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualizing Temporal Topic Embeddings with a Compass 用指南针可视化时态主题嵌入

arXiv - CS - Graphics

Pub Date : 2024-09-16 DOI: arxiv-2409.10649

Daniel Palamarchuk, Lemara Williams, Brian Mayer, Thomas Danielson, Rebecca Faust, Larry Deschaine, Chris North

Dynamic topic modeling is useful at discovering the development and change inlatent topics over time. However, present methodology relies on algorithms thatseparate document and word representations. This prevents the creation of ameaningful embedding space where changes in word usage and documents can bedirectly analyzed in a temporal context. This paper proposes an expansion ofthe compass-aligned temporal Word2Vec methodology into dynamic topic modeling.Such a method allows for the direct comparison of word and document embeddingsacross time in dynamic topics. This enables the creation of visualizations thatincorporate temporal word embeddings within the context of documents into topicvisualizations. In experiments against the current state-of-the-art, ourproposed method demonstrates overall competitive performance in topic relevancyand diversity across temporal datasets of varying size. Simultaneously, itprovides insightful visualizations focused on temporal word embeddings whilemaintaining the insights provided by global topic evolution, advancing ourunderstanding of how topics evolve over time.

动态主题建模有助于发现长期主题的发展和变化。然而，目前的方法依赖于将文档和词语表征分开的算法。这就妨碍了创建一个有意义的嵌入空间，在这个空间中，可以在时间上下文中直接分析词的用法和文档的变化。本文提出将指南针对齐的时态 Word2Vec 方法扩展到动态主题建模中。这样就可以创建可视化，将文档上下文中的时态词嵌入整合到主题可视化中。在与当前最先进方法的对比实验中，我们提出的方法在不同规模的时态数据集上的主题相关性和多样性方面表现出了全面的竞争力。同时，它还提供了具有洞察力的可视化，重点关注时态词嵌入，同时保持了全局话题演化所提供的洞察力，从而推进了我们对话题如何随时间演化的理解。

{"title":"Visualizing Temporal Topic Embeddings with a Compass","authors":"Daniel Palamarchuk, Lemara Williams, Brian Mayer, Thomas Danielson, Rebecca Faust, Larry Deschaine, Chris North","doi":"arxiv-2409.10649","DOIUrl":"https://doi.org/arxiv-2409.10649","url":null,"abstract":"Dynamic topic modeling is useful at discovering the development and change in\u0000latent topics over time. However, present methodology relies on algorithms that\u0000separate document and word representations. This prevents the creation of a\u0000meaningful embedding space where changes in word usage and documents can be\u0000directly analyzed in a temporal context. This paper proposes an expansion of\u0000the compass-aligned temporal Word2Vec methodology into dynamic topic modeling.\u0000Such a method allows for the direct comparison of word and document embeddings\u0000across time in dynamic topics. This enables the creation of visualizations that\u0000incorporate temporal word embeddings within the context of documents into topic\u0000visualizations. In experiments against the current state-of-the-art, our\u0000proposed method demonstrates overall competitive performance in topic relevancy\u0000and diversity across temporal datasets of varying size. Simultaneously, it\u0000provides insightful visualizations focused on temporal word embeddings while\u0000maintaining the insights provided by global topic evolution, advancing our\u0000understanding of how topics evolve over time.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DrawingSpinUp: 3D Animation from Single Character Drawings DrawingSpinUp：从单个角色绘图制作三维动画

arXiv - CS - Graphics

Pub Date : 2024-09-13 DOI: arxiv-2409.08615

Jie Zhou, Chufeng Xiao, Miu-Ling Lam, Hongbo Fu

Animating various character drawings is an engaging visual content creationtask. Given a single character drawing, existing animation methods are limitedto flat 2D motions and thus lack 3D effects. An alternative solution is toreconstruct a 3D model from a character drawing as a proxy and then retarget 3Dmotion data onto it. However, the existing image-to-3D methods could not workwell for amateur character drawings in terms of appearance and geometry. Weobserve the contour lines, commonly existing in character drawings, wouldintroduce significant ambiguity in texture synthesis due to theirview-dependence. Additionally, thin regions represented by single-line contoursare difficult to reconstruct (e.g., slim limbs of a stick figure) due to theirdelicate structures. To address these issues, we propose a novel system,DrawingSpinUp, to produce plausible 3D animations and breathe life intocharacter drawings, allowing them to freely spin up, leap, and even perform ahip-hop dance. For appearance improvement, we adopt a removal-then-restorationstrategy to first remove the view-dependent contour lines and then render themback after retargeting the reconstructed character. For geometry refinement, wedevelop a skeleton-based thinning deformation algorithm to refine the slimstructures represented by the single-line contours. The experimentalevaluations and a perceptual user study show that our proposed methodoutperforms the existing 2D and 3D animation methods and generates high-quality3D animations from a single character drawing. Please refer to our project page(https://lordliang.github.io/DrawingSpinUp) for the code and generatedanimations.

为各种角色图画制作动画是一项引人入胜的视觉内容创作任务。对于单个角色图画，现有的动画方法仅限于平面 2D 运动，因此缺乏 3D 效果。另一种解决方案是以人物图画为代理重建三维模型，然后将三维运动数据重定向到模型上。然而，现有的图像-3D 方法在外观和几何形状方面无法满足业余角色绘图的要求。我们注意到，人物图画中常见的轮廓线由于与视角有关，会在纹理合成中产生很大的模糊性。此外，单线轮廓线所代表的细长区域（如棍棒人物的细长四肢）由于结构复杂而难以重建。为了解决这些问题，我们提出了一种新颖的系统--DrawingSpinUp，它可以生成可信的三维动画，并为人物画注入生命力，使其能够自由旋转、跳跃，甚至表演hip-hop舞蹈。在外观改进方面，我们采用了 "移除--再修复 "策略，首先移除与视图相关的轮廓线，然后在重构角色后将其渲染回来。在几何细化方面，我们开发了一种基于骨架的细化变形算法，以细化单线轮廓所代表的纤细结构。实验评估和用户感知研究表明，我们提出的方法优于现有的二维和三维动画制作方法，能从单个角色绘制生成高质量的三维动画。有关代码和生成的动画，请参阅我们的项目页面（https://lordliang.github.io/DrawingSpinUp）。

{"title":"DrawingSpinUp: 3D Animation from Single Character Drawings","authors":"Jie Zhou, Chufeng Xiao, Miu-Ling Lam, Hongbo Fu","doi":"arxiv-2409.08615","DOIUrl":"https://doi.org/arxiv-2409.08615","url":null,"abstract":"Animating various character drawings is an engaging visual content creation\u0000task. Given a single character drawing, existing animation methods are limited\u0000to flat 2D motions and thus lack 3D effects. An alternative solution is to\u0000reconstruct a 3D model from a character drawing as a proxy and then retarget 3D\u0000motion data onto it. However, the existing image-to-3D methods could not work\u0000well for amateur character drawings in terms of appearance and geometry. We\u0000observe the contour lines, commonly existing in character drawings, would\u0000introduce significant ambiguity in texture synthesis due to their\u0000view-dependence. Additionally, thin regions represented by single-line contours\u0000are difficult to reconstruct (e.g., slim limbs of a stick figure) due to their\u0000delicate structures. To address these issues, we propose a novel system,\u0000DrawingSpinUp, to produce plausible 3D animations and breathe life into\u0000character drawings, allowing them to freely spin up, leap, and even perform a\u0000hip-hop dance. For appearance improvement, we adopt a removal-then-restoration\u0000strategy to first remove the view-dependent contour lines and then render them\u0000back after retargeting the reconstructed character. For geometry refinement, we\u0000develop a skeleton-based thinning deformation algorithm to refine the slim\u0000structures represented by the single-line contours. The experimental\u0000evaluations and a perceptual user study show that our proposed method\u0000outperforms the existing 2D and 3D animation methods and generates high-quality\u00003D animations from a single character drawing. Please refer to our project page\u0000(https://lordliang.github.io/DrawingSpinUp) for the code and generated\u0000animations.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"122 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis 利用多重照明合成进行辐射场再照明的扩散方法

arXiv - CS - Graphics

Pub Date : 2024-09-13 DOI: arxiv-2409.08947

Yohan Poirier-Ginter, Alban Gauthier, Julien Phillip, Jean-Francois Lalonde, George Drettakis

Relighting radiance fields is severely underconstrained for multi-view data,which is most often captured under a single illumination condition; It isespecially hard for full scenes containing multiple objects. We introduce amethod to create relightable radiance fields using such single-illuminationdata by exploiting priors extracted from 2D image diffusion models. We firstfine-tune a 2D diffusion model on a multi-illumination dataset conditioned bylight direction, allowing us to augment a single-illumination capture into arealistic -- but possibly inconsistent -- multi-illumination dataset fromdirectly defined light directions. We use this augmented data to create arelightable radiance field represented by 3D Gaussian splats. To allow directcontrol of light direction for low-frequency lighting, we represent appearancewith a multi-layer perceptron parameterized on light direction. To enforcemulti-view consistency and overcome inaccuracies we optimize a per-imageauxiliary feature vector. We show results on synthetic and real multi-view dataunder single illumination, demonstrating that our method successfully exploits2D diffusion model priors to allow realistic 3D relighting for complete scenes.Project sitehttps://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/

多视角数据通常是在单一光照条件下捕获的；对于包含多个物体的全场景来说，重照辐射场严重受限。我们介绍了一种方法，通过利用从二维图像扩散模型中提取的先验值，使用此类单光照数据创建可重照辐射场。我们首先在以光照方向为条件的多光照数据集上对 2D 扩散模型进行微调，这样就可以将单光照捕捉数据增强为直接定义光照方向的多光照数据集，但可能不一致。我们利用这些增强数据来创建由三维高斯光斑表示的可照明辐射场。为了对低频照明的光照方向进行直接控制，我们使用了一个以光照方向为参数的多层感知器来表示外观。为了实现多视角一致性并克服不准确性，我们优化了每个图像的辅助特征向量。我们展示了在单一光照下合成和真实多视角数据的结果，证明我们的方法成功地利用了二维扩散模型先验，为完整场景提供了逼真的三维再照明。项目网站https://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/

{"title":"A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis","authors":"Yohan Poirier-Ginter, Alban Gauthier, Julien Phillip, Jean-Francois Lalonde, George Drettakis","doi":"arxiv-2409.08947","DOIUrl":"https://doi.org/arxiv-2409.08947","url":null,"abstract":"Relighting radiance fields is severely underconstrained for multi-view data,\u0000which is most often captured under a single illumination condition; It is\u0000especially hard for full scenes containing multiple objects. We introduce a\u0000method to create relightable radiance fields using such single-illumination\u0000data by exploiting priors extracted from 2D image diffusion models. We first\u0000fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by\u0000light direction, allowing us to augment a single-illumination capture into a\u0000realistic -- but possibly inconsistent -- multi-illumination dataset from\u0000directly defined light directions. We use this augmented data to create a\u0000relightable radiance field represented by 3D Gaussian splats. To allow direct\u0000control of light direction for low-frequency lighting, we represent appearance\u0000with a multi-layer perceptron parameterized on light direction. To enforce\u0000multi-view consistency and overcome inaccuracies we optimize a per-image\u0000auxiliary feature vector. We show results on synthetic and real multi-view data\u0000under single illumination, demonstrating that our method successfully exploits\u00002D diffusion model priors to allow realistic 3D relighting for complete scenes.\u0000Project site\u0000https://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius AdR-Gaussian：利用自适应半径加速高斯拼接

arXiv - CS - Graphics

Pub Date : 2024-09-13 DOI: arxiv-2409.08669

Xinzhe Wang, Ran Yi, Lizhuang Ma

3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that hasachieved high-quality reconstruction and real-time rendering of complex scenes.However, the rasterization pipeline still suffers from unnecessary overheadresulting from avoidable serial Gaussian culling, and uneven load due to thedistinct number of Gaussian to be rendered across pixels, which hinders widerpromotion and application of 3DGS. In order to accelerate Gaussian splatting,we propose AdR-Gaussian, which moves part of serial culling in Render stageinto the earlier Preprocess stage to enable parallel culling, employingadaptive radius to narrow the rendering pixel range for each Gaussian, andintroduces a load balancing method to minimize thread waiting time during thepixel-parallel rendering. Our contributions are threefold, achieving arendering speed of 310% while maintaining equivalent or even better qualitythan the state-of-the-art. Firstly, we propose to early cull Gaussian-Tilepairs of low splatting opacity based on an adaptive radius in theGaussian-parallel Preprocess stage, which reduces the number of affected tilethrough the Gaussian bounding circle, thus reducing unnecessary overhead andachieving faster rendering speed. Secondly, we further propose early cullingbased on axis-aligned bounding box for Gaussian splatting, which achieves amore significant reduction in ineffective expenses by accurately calculatingthe Gaussian size in the 2D directions. Thirdly, we propose a balancingalgorithm for pixel thread load, which compresses the information of heavy-loadpixels to reduce thread waiting time, and enhance information of light-loadpixels to hedge against rendering quality loss. Experiments on three datasetsdemonstrate that our algorithm can significantly improve the Gaussian Splattingrendering speed.

三维高斯拼接（3DGS）是近年来出现的一种显式三维表示方法，它实现了复杂场景的高质量重建和实时渲染。然而，光栅化流水线仍然存在不必要的开销，这些开销来自于可避免的串行高斯剔除，以及由于像素间需要渲染的高斯数量不同而导致的负载不均，这阻碍了3DGS的广泛推广和应用。为了加速高斯拼接，我们提出了 AdR-Gaussian，它将渲染阶段的部分串行剔除移到了更早的预处理阶段以实现并行剔除，采用自适应半径缩小每个高斯的渲染像素范围，并引入负载均衡方法以尽量减少像素并行渲染过程中的线程等待时间。我们的贡献体现在三个方面，使渲染速度提高了 310%，同时保持了与最先进技术相当甚至更好的质量。首先，我们建议在高斯并行预处理阶段，基于自适应半径对低溅射不透明度的高斯瓦片对进行早期剔除，从而通过高斯包围圈减少受影响瓦片的数量，从而减少不必要的开销，实现更快的渲染速度。其次，我们进一步提出了基于轴对齐包围盒的高斯拼接早期剔除方法，通过精确计算二维方向的高斯大小，更显著地减少了无效开销。第三，我们提出了一种像素线程负载平衡算法，压缩重负载像素的信息以减少线程等待时间，同时增强轻负载像素的信息以对冲渲染质量的损失。在三个数据集上的实验证明，我们的算法可以显著提高高斯拼接渲染的速度。

{"title":"AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius","authors":"Xinzhe Wang, Ran Yi, Lizhuang Ma","doi":"arxiv-2409.08669","DOIUrl":"https://doi.org/arxiv-2409.08669","url":null,"abstract":"3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has\u0000achieved high-quality reconstruction and real-time rendering of complex scenes.\u0000However, the rasterization pipeline still suffers from unnecessary overhead\u0000resulting from avoidable serial Gaussian culling, and uneven load due to the\u0000distinct number of Gaussian to be rendered across pixels, which hinders wider\u0000promotion and application of 3DGS. In order to accelerate Gaussian splatting,\u0000we propose AdR-Gaussian, which moves part of serial culling in Render stage\u0000into the earlier Preprocess stage to enable parallel culling, employing\u0000adaptive radius to narrow the rendering pixel range for each Gaussian, and\u0000introduces a load balancing method to minimize thread waiting time during the\u0000pixel-parallel rendering. Our contributions are threefold, achieving a\u0000rendering speed of 310% while maintaining equivalent or even better quality\u0000than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile\u0000pairs of low splatting opacity based on an adaptive radius in the\u0000Gaussian-parallel Preprocess stage, which reduces the number of affected tile\u0000through the Gaussian bounding circle, thus reducing unnecessary overhead and\u0000achieving faster rendering speed. Secondly, we further propose early culling\u0000based on axis-aligned bounding box for Gaussian splatting, which achieves a\u0000more significant reduction in ineffective expenses by accurately calculating\u0000the Gaussian size in the 2D directions. Thirdly, we propose a balancing\u0000algorithm for pixel thread load, which compresses the information of heavy-load\u0000pixels to reduce thread waiting time, and enhance information of light-load\u0000pixels to hedge against rendering quality loss. Experiments on three datasets\u0000demonstrate that our algorithm can significantly improve the Gaussian Splatting\u0000rendering speed.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos 用于沉浸式以人为中心的体积测量视频的鲁棒双高斯拼接技术

arXiv - CS - Graphics

Pub Date : 2024-09-12 DOI: arxiv-2409.08353

Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu

Volumetric video represents a transformative advancement in visual media,enabling users to freely navigate immersive virtual experiences and narrowingthe gap between digital and real worlds. However, the need for extensive manualintervention to stabilize mesh sequences and the generation of excessivelylarge assets in existing workflows impedes broader adoption. In this paper, wepresent a novel Gaussian-based approach, dubbed textit{DualGS}, for real-timeand high-fidelity playback of complex human performance with excellentcompression ratios. Our key idea in DualGS is to separately represent motionand appearance using the corresponding skin and joint Gaussians. Such anexplicit disentanglement can significantly reduce motion redundancy and enhancetemporal coherence. We begin by initializing the DualGS and anchoring skinGaussians to joint Gaussians at the first frame. Subsequently, we employ acoarse-to-fine training strategy for frame-by-frame human performance modeling.It includes a coarse alignment phase for overall motion prediction as well as afine-grained optimization for robust tracking and high-fidelity rendering. Tointegrate volumetric video seamlessly into VR environments, we efficientlycompress motion using entropy encoding and appearance using codec compressioncoupled with a persistent codebook. Our approach achieves a compression ratioof up to 120 times, only requiring approximately 350KB of storage per frame. Wedemonstrate the efficacy of our representation through photo-realistic,free-view experiences on VR headsets, enabling users to immersively watchmusicians in performance and feel the rhythm of the notes at the performers'fingertips.

体积视频代表了视觉媒体的变革性进步，使用户能够自由浏览身临其境的虚拟体验，缩小数字世界与现实世界之间的差距。然而，在现有的工作流程中，需要大量的人工干预来稳定网格序列，并生成过大的资产，这阻碍了更广泛的应用。在本文中，我们提出了一种新颖的基于高斯的方法--DualGS，用于实时、高保真地回放具有出色压缩比的复杂人体表演。我们在 DualGS 中的主要想法是使用相应的皮肤高斯和关节高斯分别表示运动和外观。这种明确的分离可以显著减少运动冗余，增强时空一致性。我们首先初始化 DualGS，并在第一帧将皮肤高斯锚定为联合高斯。随后，我们采用由粗到细的训练策略进行逐帧人体表现建模，其中包括用于整体运动预测的粗对齐阶段，以及用于稳健跟踪和高保真渲染的细粒度优化。为了将体积视频无缝集成到 VR 环境中，我们使用熵编码对运动进行了有效压缩，并使用编解码器压缩外观，再加上持久的编码本。我们的方法实现了高达 120 倍的压缩率，每帧仅需约 350KB 的存储空间。我们通过在 VR 头显上进行照片般逼真的自由视角体验，展示了我们的表示法的功效，使用户能够身临其境地观看音乐家的表演，感受表演者指尖上的音符节奏。

{"title":"Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos","authors":"Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu","doi":"arxiv-2409.08353","DOIUrl":"https://doi.org/arxiv-2409.08353","url":null,"abstract":"Volumetric video represents a transformative advancement in visual media,\u0000enabling users to freely navigate immersive virtual experiences and narrowing\u0000the gap between digital and real worlds. However, the need for extensive manual\u0000intervention to stabilize mesh sequences and the generation of excessively\u0000large assets in existing workflows impedes broader adoption. In this paper, we\u0000present a novel Gaussian-based approach, dubbed textit{DualGS}, for real-time\u0000and high-fidelity playback of complex human performance with excellent\u0000compression ratios. Our key idea in DualGS is to separately represent motion\u0000and appearance using the corresponding skin and joint Gaussians. Such an\u0000explicit disentanglement can significantly reduce motion redundancy and enhance\u0000temporal coherence. We begin by initializing the DualGS and anchoring skin\u0000Gaussians to joint Gaussians at the first frame. Subsequently, we employ a\u0000coarse-to-fine training strategy for frame-by-frame human performance modeling.\u0000It includes a coarse alignment phase for overall motion prediction as well as a\u0000fine-grained optimization for robust tracking and high-fidelity rendering. To\u0000integrate volumetric video seamlessly into VR environments, we efficiently\u0000compress motion using entropy encoding and appearance using codec compression\u0000coupled with a persistent codebook. Our approach achieves a compression ratio\u0000of up to 120 times, only requiring approximately 350KB of storage per frame. We\u0000demonstrate the efficacy of our representation through photo-realistic,\u0000free-view experiences on VR headsets, enabling users to immersively watch\u0000musicians in performance and feel the rhythm of the notes at the performers'\u0000fingertips.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis Thermal3D-GS：用于热红外新视角合成的物理诱导三维高斯模型

arXiv - CS - Graphics

Pub Date : 2024-09-12 DOI: arxiv-2409.08042

Qian Chen, Shihao Shu, Xiangzhi Bai

Novel-view synthesis based on visible light has been extensively studied. Incomparison to visible light imaging, thermal infrared imaging offers theadvantage of all-weather imaging and strong penetration, providing increasedpossibilities for reconstruction in nighttime and adverse weather scenarios.However, thermal infrared imaging is influenced by physical characteristicssuch as atmospheric transmission effects and thermal conduction, hindering theprecise reconstruction of intricate details in thermal infrared scenes,manifesting as issues of floaters and indistinct edge features in synthesizedimages. To address these limitations, this paper introduces a physics-induced3D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins bymodeling atmospheric transmission effects and thermal conduction inthree-dimensional media using neural networks. Additionally, a temperatureconsistency constraint is incorporated into the optimization objective toenhance the reconstruction accuracy of thermal infrared images. Furthermore, tovalidate the effectiveness of our method, the first large-scale benchmarkdataset for this field named Thermal Infrared Novel-view Synthesis Dataset(TI-NSD) is created. This dataset comprises 20 authentic thermal infrared videoscenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios,totaling 6,664 frames of thermal infrared image data. Based on this dataset,this paper experimentally verifies the effectiveness of Thermal3D-GS. Theresults indicate that our method outperforms the baseline method with a 3.03 dBimprovement in PSNR and significantly addresses the issues of floaters andindistinct edge features present in the baseline method. Our dataset andcodebase will be released inhref{https://github.com/mzzcdf/Thermal3DGS}{textcolor{red}{Thermal3DGS}}.

基于可见光的新视角合成技术已得到广泛研究。然而，热红外成像受大气传输效应和热传导等物理特性的影响，无法精确重建热红外场景中的复杂细节，表现为合成图像中的浮点和边缘特征不清晰等问题。为了解决这些问题，本文介绍了一种名为 Thermal3D-GS 的物理诱导三维高斯拼接方法。 Thermal3D-GS 首先利用神经网络模拟三维介质中的大气传输效应和热传导。此外，还在优化目标中加入了温度一致性约束，以提高热红外图像的重建精度。此外，为了验证我们方法的有效性，我们创建了该领域第一个大规模基准数据集，名为 "热红外新视图合成数据集"（TI-NSD）。该数据集包括 20 个真实的热红外视频场景，涵盖室内、室外和无人机（UAV）场景，共计 6664 帧热红外图像数据。结果表明，我们的方法优于基线方法，PSNR 提高了 3.03 dB，并显著解决了基线方法中存在的浮点和边缘特征不明显的问题。我们的数据集和代码库将在https://github.com/mzzcdf/Thermal3DGS}{textcolor{red}{Thermal3DGS}}中发布。

{"title":"Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis","authors":"Qian Chen, Shihao Shu, Xiangzhi Bai","doi":"arxiv-2409.08042","DOIUrl":"https://doi.org/arxiv-2409.08042","url":null,"abstract":"Novel-view synthesis based on visible light has been extensively studied. In\u0000comparison to visible light imaging, thermal infrared imaging offers the\u0000advantage of all-weather imaging and strong penetration, providing increased\u0000possibilities for reconstruction in nighttime and adverse weather scenarios.\u0000However, thermal infrared imaging is influenced by physical characteristics\u0000such as atmospheric transmission effects and thermal conduction, hindering the\u0000precise reconstruction of intricate details in thermal infrared scenes,\u0000manifesting as issues of floaters and indistinct edge features in synthesized\u0000images. To address these limitations, this paper introduces a physics-induced\u00003D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins by\u0000modeling atmospheric transmission effects and thermal conduction in\u0000three-dimensional media using neural networks. Additionally, a temperature\u0000consistency constraint is incorporated into the optimization objective to\u0000enhance the reconstruction accuracy of thermal infrared images. Furthermore, to\u0000validate the effectiveness of our method, the first large-scale benchmark\u0000dataset for this field named Thermal Infrared Novel-view Synthesis Dataset\u0000(TI-NSD) is created. This dataset comprises 20 authentic thermal infrared video\u0000scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios,\u0000totaling 6,664 frames of thermal infrared image data. Based on this dataset,\u0000this paper experimentally verifies the effectiveness of Thermal3D-GS. The\u0000results indicate that our method outperforms the baseline method with a 3.03 dB\u0000improvement in PSNR and significantly addresses the issues of floaters and\u0000indistinct edge features present in the baseline method. Our dataset and\u0000codebase will be released in\u0000href{https://github.com/mzzcdf/Thermal3DGS}{textcolor{red}{Thermal3DGS}}.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0