Kartik Teotia, Hyeongwoo Kim, Pablo Garrido, Marc Habermann, Mohamed Elgharib, Christian Theobalt
Real-time rendering of human head avatars is a cornerstone of many computer graphics applications, such as augmented reality, video games, and films, to name a few. Recent approaches address this challenge with computationally efficient geometry primitives in a carefully calibrated multi-view setup. Albeit producing photorealistic head renderings, it often fails to represent complex motion changes such as the mouth interior and strongly varying head poses. We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. First, with rich facial features extracted from raw input frames, we learn to deform the coarse facial geometry of the template mesh. We then initialize 3D Gaussians on the deformed surface and refine their positions in a fine step. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework. This enables not only controllable facial animation via video inputs, but also high-fidelity novel view synthesis of challenging facial expressions, such as tongue deformations and fine-grained teeth structure under large motion changes. Moreover, it encourages the learned head avatar to generalize towards new facial expressions and head poses at inference time. We demonstrate the performance of our method with comparisons against the related methods on different datasets, spanning challenging facial expression sequences across multiple identities. We also show the potential application of our approach by demonstrating a cross-identity facial performance transfer application.
{"title":"GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations","authors":"Kartik Teotia, Hyeongwoo Kim, Pablo Garrido, Marc Habermann, Mohamed Elgharib, Christian Theobalt","doi":"arxiv-2409.11951","DOIUrl":"https://doi.org/arxiv-2409.11951","url":null,"abstract":"Real-time rendering of human head avatars is a cornerstone of many computer\u0000graphics applications, such as augmented reality, video games, and films, to\u0000name a few. Recent approaches address this challenge with computationally\u0000efficient geometry primitives in a carefully calibrated multi-view setup.\u0000Albeit producing photorealistic head renderings, it often fails to represent\u0000complex motion changes such as the mouth interior and strongly varying head\u0000poses. We propose a new method to generate highly dynamic and deformable human\u0000head avatars from multi-view imagery in real-time. At the core of our method is\u0000a hierarchical representation of head models that allows to capture the complex\u0000dynamics of facial expressions and head movements. First, with rich facial\u0000features extracted from raw input frames, we learn to deform the coarse facial\u0000geometry of the template mesh. We then initialize 3D Gaussians on the deformed\u0000surface and refine their positions in a fine step. We train this coarse-to-fine\u0000facial avatar model along with the head pose as a learnable parameter in an\u0000end-to-end framework. This enables not only controllable facial animation via\u0000video inputs, but also high-fidelity novel view synthesis of challenging facial\u0000expressions, such as tongue deformations and fine-grained teeth structure under\u0000large motion changes. Moreover, it encourages the learned head avatar to\u0000generalize towards new facial expressions and head poses at inference time. We\u0000demonstrate the performance of our method with comparisons against the related\u0000methods on different datasets, spanning challenging facial expression sequences\u0000across multiple identities. We also show the potential application of our\u0000approach by demonstrating a cross-identity facial performance transfer\u0000application.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Creating and updating pixel art character sprites with many frames spanning different animations and poses takes time and can quickly become repetitive. However, that can be partially automated to allow artists to focus on more creative tasks. In this work, we concentrate on creating pixel art character sprites in a target pose from images of them facing other three directions. We present a novel approach to character generation by framing the problem as a missing data imputation task. Our proposed generative adversarial networks model receives the images of a character in all available domains and produces the image of the missing pose. We evaluated our approach in the scenarios with one, two, and three missing images, achieving similar or better results to the state-of-the-art when more images are available. We also evaluate the impact of the proposed changes to the base architecture.
{"title":"A Missing Data Imputation GAN for Character Sprite Generation","authors":"Flávio Coutinho, Luiz Chaimowicz","doi":"arxiv-2409.10721","DOIUrl":"https://doi.org/arxiv-2409.10721","url":null,"abstract":"Creating and updating pixel art character sprites with many frames spanning\u0000different animations and poses takes time and can quickly become repetitive.\u0000However, that can be partially automated to allow artists to focus on more\u0000creative tasks. In this work, we concentrate on creating pixel art character\u0000sprites in a target pose from images of them facing other three directions. We\u0000present a novel approach to character generation by framing the problem as a\u0000missing data imputation task. Our proposed generative adversarial networks\u0000model receives the images of a character in all available domains and produces\u0000the image of the missing pose. We evaluated our approach in the scenarios with\u0000one, two, and three missing images, achieving similar or better results to the\u0000state-of-the-art when more images are available. We also evaluate the impact of\u0000the proposed changes to the base architecture.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose two novel ideas (adoption of deferred rendering and mesh-based representation) to improve the quality of 3D Gaussian splatting (3DGS) based inverse rendering. We first report a problem incurred by hidden Gaussians, where Gaussians beneath the surface adversely affect the pixel color in the volume rendering adopted by the existing methods. In order to resolve the problem, we propose applying deferred rendering and report new problems incurred in a naive application of deferred rendering to the existing 3DGS-based inverse rendering. In an effort to improve the quality of 3DGS-based inverse rendering under deferred rendering, we propose a novel two-step training approach which (1) exploits mesh extraction and utilizes a hybrid mesh-3DGS representation and (2) applies novel regularization methods to better exploit the mesh. Our experiments show that, under relighting, the proposed method offers significantly better rendering quality than the existing 3DGS-based inverse rendering methods. Compared with the SOTA voxel grid-based inverse rendering method, it gives better rendering quality while offering real-time rendering.
我们提出了两个新想法(采用延迟渲染和基于网格的表示)来提高基于三维高斯拼接(3DGS)的反向渲染质量。我们首先报告了隐藏高斯带来的一个问题,即在现有方法采用的体积渲染中,表面下的高斯会对像素颜色产生不利影响。为了解决这个问题,我们建议应用延迟渲染,并报告了在现有的基于 3DGS 的反渲染中应用延迟渲染所产生的新问题。为了提高延迟渲染下基于 3DGS 的反渲染质量,我们提出了一种新颖的两步训练方法:(1)利用网格提取并使用混合网格-3DGS 表示法;(2)应用新颖的正则化方法来更好地利用网格。我们的实验表明,在重新照明条件下,所提出的方法比现有的基于 3DGS 的反渲染方法具有更好的渲染质量。与基于体素网格的 SOTA 逆渲染方法相比,它在提供实时渲染的同时,还能提供更好的渲染质量。
{"title":"Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering","authors":"Euntae Choi, Sungjoo Yoo","doi":"arxiv-2409.10335","DOIUrl":"https://doi.org/arxiv-2409.10335","url":null,"abstract":"We propose two novel ideas (adoption of deferred rendering and mesh-based\u0000representation) to improve the quality of 3D Gaussian splatting (3DGS) based\u0000inverse rendering. We first report a problem incurred by hidden Gaussians,\u0000where Gaussians beneath the surface adversely affect the pixel color in the\u0000volume rendering adopted by the existing methods. In order to resolve the\u0000problem, we propose applying deferred rendering and report new problems\u0000incurred in a naive application of deferred rendering to the existing\u00003DGS-based inverse rendering. In an effort to improve the quality of 3DGS-based\u0000inverse rendering under deferred rendering, we propose a novel two-step\u0000training approach which (1) exploits mesh extraction and utilizes a hybrid\u0000mesh-3DGS representation and (2) applies novel regularization methods to better\u0000exploit the mesh. Our experiments show that, under relighting, the proposed\u0000method offers significantly better rendering quality than the existing\u00003DGS-based inverse rendering methods. Compared with the SOTA voxel grid-based\u0000inverse rendering method, it gives better rendering quality while offering\u0000real-time rendering.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li
We introduce Playground v3 (PGv3), our latest text-to-image model that achieves state-of-the-art (SoTA) performance across multiple testing benchmarks, excels in graphic design abilities and introduces new capabilities. Unlike traditional text-to-image generative models that rely on pre-trained language models like T5 or CLIP text encoders, our approach fully integrates Large Language Models (LLMs) with a novel structure that leverages text conditions exclusively from a decoder-only LLM. Additionally, to enhance image captioning quality-we developed an in-house captioner, capable of generating captions with varying levels of detail, enriching the diversity of text structures. We also introduce a new benchmark CapsBench to evaluate detailed image captioning performance. Experimental results demonstrate that PGv3 excels in text prompt adherence, complex reasoning, and accurate text rendering. User preference studies indicate the super-human graphic design ability of our model for common design applications, such as stickers, posters, and logo designs. Furthermore, PGv3 introduces new capabilities, including precise RGB color control and robust multilingual understanding.
{"title":"Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models","authors":"Bingchen Liu, Ehsan Akhgari, Alexander Visheratin, Aleks Kamko, Linmiao Xu, Shivam Shrirao, Joao Souza, Suhail Doshi, Daiqing Li","doi":"arxiv-2409.10695","DOIUrl":"https://doi.org/arxiv-2409.10695","url":null,"abstract":"We introduce Playground v3 (PGv3), our latest text-to-image model that\u0000achieves state-of-the-art (SoTA) performance across multiple testing\u0000benchmarks, excels in graphic design abilities and introduces new capabilities.\u0000Unlike traditional text-to-image generative models that rely on pre-trained\u0000language models like T5 or CLIP text encoders, our approach fully integrates\u0000Large Language Models (LLMs) with a novel structure that leverages text\u0000conditions exclusively from a decoder-only LLM. Additionally, to enhance image\u0000captioning quality-we developed an in-house captioner, capable of generating\u0000captions with varying levels of detail, enriching the diversity of text\u0000structures. We also introduce a new benchmark CapsBench to evaluate detailed\u0000image captioning performance. Experimental results demonstrate that PGv3 excels\u0000in text prompt adherence, complex reasoning, and accurate text rendering. User\u0000preference studies indicate the super-human graphic design ability of our model\u0000for common design applications, such as stickers, posters, and logo designs.\u0000Furthermore, PGv3 introduces new capabilities, including precise RGB color\u0000control and robust multilingual understanding.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"99 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Palamarchuk, Lemara Williams, Brian Mayer, Thomas Danielson, Rebecca Faust, Larry Deschaine, Chris North
Dynamic topic modeling is useful at discovering the development and change in latent topics over time. However, present methodology relies on algorithms that separate document and word representations. This prevents the creation of a meaningful embedding space where changes in word usage and documents can be directly analyzed in a temporal context. This paper proposes an expansion of the compass-aligned temporal Word2Vec methodology into dynamic topic modeling. Such a method allows for the direct comparison of word and document embeddings across time in dynamic topics. This enables the creation of visualizations that incorporate temporal word embeddings within the context of documents into topic visualizations. In experiments against the current state-of-the-art, our proposed method demonstrates overall competitive performance in topic relevancy and diversity across temporal datasets of varying size. Simultaneously, it provides insightful visualizations focused on temporal word embeddings while maintaining the insights provided by global topic evolution, advancing our understanding of how topics evolve over time.
{"title":"Visualizing Temporal Topic Embeddings with a Compass","authors":"Daniel Palamarchuk, Lemara Williams, Brian Mayer, Thomas Danielson, Rebecca Faust, Larry Deschaine, Chris North","doi":"arxiv-2409.10649","DOIUrl":"https://doi.org/arxiv-2409.10649","url":null,"abstract":"Dynamic topic modeling is useful at discovering the development and change in\u0000latent topics over time. However, present methodology relies on algorithms that\u0000separate document and word representations. This prevents the creation of a\u0000meaningful embedding space where changes in word usage and documents can be\u0000directly analyzed in a temporal context. This paper proposes an expansion of\u0000the compass-aligned temporal Word2Vec methodology into dynamic topic modeling.\u0000Such a method allows for the direct comparison of word and document embeddings\u0000across time in dynamic topics. This enables the creation of visualizations that\u0000incorporate temporal word embeddings within the context of documents into topic\u0000visualizations. In experiments against the current state-of-the-art, our\u0000proposed method demonstrates overall competitive performance in topic relevancy\u0000and diversity across temporal datasets of varying size. Simultaneously, it\u0000provides insightful visualizations focused on temporal word embeddings while\u0000maintaining the insights provided by global topic evolution, advancing our\u0000understanding of how topics evolve over time.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Animating various character drawings is an engaging visual content creation task. Given a single character drawing, existing animation methods are limited to flat 2D motions and thus lack 3D effects. An alternative solution is to reconstruct a 3D model from a character drawing as a proxy and then retarget 3D motion data onto it. However, the existing image-to-3D methods could not work well for amateur character drawings in terms of appearance and geometry. We observe the contour lines, commonly existing in character drawings, would introduce significant ambiguity in texture synthesis due to their view-dependence. Additionally, thin regions represented by single-line contours are difficult to reconstruct (e.g., slim limbs of a stick figure) due to their delicate structures. To address these issues, we propose a novel system, DrawingSpinUp, to produce plausible 3D animations and breathe life into character drawings, allowing them to freely spin up, leap, and even perform a hip-hop dance. For appearance improvement, we adopt a removal-then-restoration strategy to first remove the view-dependent contour lines and then render them back after retargeting the reconstructed character. For geometry refinement, we develop a skeleton-based thinning deformation algorithm to refine the slim structures represented by the single-line contours. The experimental evaluations and a perceptual user study show that our proposed method outperforms the existing 2D and 3D animation methods and generates high-quality 3D animations from a single character drawing. Please refer to our project page (https://lordliang.github.io/DrawingSpinUp) for the code and generated animations.
为各种角色图画制作动画是一项引人入胜的视觉内容创作任务。对于单个角色图画,现有的动画方法仅限于平面 2D 运动,因此缺乏 3D 效果。另一种解决方案是以人物图画为代理重建三维模型,然后将三维运动数据重定向到模型上。然而,现有的图像-3D 方法在外观和几何形状方面无法满足业余角色绘图的要求。我们注意到,人物图画中常见的轮廓线由于与视角有关,会在纹理合成中产生很大的模糊性。此外,单线轮廓线所代表的细长区域(如棍棒人物的细长四肢)由于结构复杂而难以重建。为了解决这些问题,我们提出了一种新颖的系统--DrawingSpinUp,它可以生成可信的三维动画,并为人物画注入生命力,使其能够自由旋转、跳跃,甚至表演hip-hop舞蹈。在外观改进方面,我们采用了 "移除--再修复 "策略,首先移除与视图相关的轮廓线,然后在重构角色后将其渲染回来。在几何细化方面,我们开发了一种基于骨架的细化变形算法,以细化单线轮廓所代表的纤细结构。实验评估和用户感知研究表明,我们提出的方法优于现有的二维和三维动画制作方法,能从单个角色绘制生成高质量的三维动画。有关代码和生成的动画,请参阅我们的项目页面(https://lordliang.github.io/DrawingSpinUp)。
{"title":"DrawingSpinUp: 3D Animation from Single Character Drawings","authors":"Jie Zhou, Chufeng Xiao, Miu-Ling Lam, Hongbo Fu","doi":"arxiv-2409.08615","DOIUrl":"https://doi.org/arxiv-2409.08615","url":null,"abstract":"Animating various character drawings is an engaging visual content creation\u0000task. Given a single character drawing, existing animation methods are limited\u0000to flat 2D motions and thus lack 3D effects. An alternative solution is to\u0000reconstruct a 3D model from a character drawing as a proxy and then retarget 3D\u0000motion data onto it. However, the existing image-to-3D methods could not work\u0000well for amateur character drawings in terms of appearance and geometry. We\u0000observe the contour lines, commonly existing in character drawings, would\u0000introduce significant ambiguity in texture synthesis due to their\u0000view-dependence. Additionally, thin regions represented by single-line contours\u0000are difficult to reconstruct (e.g., slim limbs of a stick figure) due to their\u0000delicate structures. To address these issues, we propose a novel system,\u0000DrawingSpinUp, to produce plausible 3D animations and breathe life into\u0000character drawings, allowing them to freely spin up, leap, and even perform a\u0000hip-hop dance. For appearance improvement, we adopt a removal-then-restoration\u0000strategy to first remove the view-dependent contour lines and then render them\u0000back after retargeting the reconstructed character. For geometry refinement, we\u0000develop a skeleton-based thinning deformation algorithm to refine the slim\u0000structures represented by the single-line contours. The experimental\u0000evaluations and a perceptual user study show that our proposed method\u0000outperforms the existing 2D and 3D animation methods and generates high-quality\u00003D animations from a single character drawing. Please refer to our project page\u0000(https://lordliang.github.io/DrawingSpinUp) for the code and generated\u0000animations.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"122 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yohan Poirier-Ginter, Alban Gauthier, Julien Phillip, Jean-Francois Lalonde, George Drettakis
Relighting radiance fields is severely underconstrained for multi-view data, which is most often captured under a single illumination condition; It is especially hard for full scenes containing multiple objects. We introduce a method to create relightable radiance fields using such single-illumination data by exploiting priors extracted from 2D image diffusion models. We first fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by light direction, allowing us to augment a single-illumination capture into a realistic -- but possibly inconsistent -- multi-illumination dataset from directly defined light directions. We use this augmented data to create a relightable radiance field represented by 3D Gaussian splats. To allow direct control of light direction for low-frequency lighting, we represent appearance with a multi-layer perceptron parameterized on light direction. To enforce multi-view consistency and overcome inaccuracies we optimize a per-image auxiliary feature vector. We show results on synthetic and real multi-view data under single illumination, demonstrating that our method successfully exploits 2D diffusion model priors to allow realistic 3D relighting for complete scenes. Project site https://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/
{"title":"A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis","authors":"Yohan Poirier-Ginter, Alban Gauthier, Julien Phillip, Jean-Francois Lalonde, George Drettakis","doi":"arxiv-2409.08947","DOIUrl":"https://doi.org/arxiv-2409.08947","url":null,"abstract":"Relighting radiance fields is severely underconstrained for multi-view data,\u0000which is most often captured under a single illumination condition; It is\u0000especially hard for full scenes containing multiple objects. We introduce a\u0000method to create relightable radiance fields using such single-illumination\u0000data by exploiting priors extracted from 2D image diffusion models. We first\u0000fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by\u0000light direction, allowing us to augment a single-illumination capture into a\u0000realistic -- but possibly inconsistent -- multi-illumination dataset from\u0000directly defined light directions. We use this augmented data to create a\u0000relightable radiance field represented by 3D Gaussian splats. To allow direct\u0000control of light direction for low-frequency lighting, we represent appearance\u0000with a multi-layer perceptron parameterized on light direction. To enforce\u0000multi-view consistency and overcome inaccuracies we optimize a per-image\u0000auxiliary feature vector. We show results on synthetic and real multi-view data\u0000under single illumination, demonstrating that our method successfully exploits\u00002D diffusion model priors to allow realistic 3D relighting for complete scenes.\u0000Project site\u0000https://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"105 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has achieved high-quality reconstruction and real-time rendering of complex scenes. However, the rasterization pipeline still suffers from unnecessary overhead resulting from avoidable serial Gaussian culling, and uneven load due to the distinct number of Gaussian to be rendered across pixels, which hinders wider promotion and application of 3DGS. In order to accelerate Gaussian splatting, we propose AdR-Gaussian, which moves part of serial culling in Render stage into the earlier Preprocess stage to enable parallel culling, employing adaptive radius to narrow the rendering pixel range for each Gaussian, and introduces a load balancing method to minimize thread waiting time during the pixel-parallel rendering. Our contributions are threefold, achieving a rendering speed of 310% while maintaining equivalent or even better quality than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile pairs of low splatting opacity based on an adaptive radius in the Gaussian-parallel Preprocess stage, which reduces the number of affected tile through the Gaussian bounding circle, thus reducing unnecessary overhead and achieving faster rendering speed. Secondly, we further propose early culling based on axis-aligned bounding box for Gaussian splatting, which achieves a more significant reduction in ineffective expenses by accurately calculating the Gaussian size in the 2D directions. Thirdly, we propose a balancing algorithm for pixel thread load, which compresses the information of heavy-load pixels to reduce thread waiting time, and enhance information of light-load pixels to hedge against rendering quality loss. Experiments on three datasets demonstrate that our algorithm can significantly improve the Gaussian Splatting rendering speed.
{"title":"AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius","authors":"Xinzhe Wang, Ran Yi, Lizhuang Ma","doi":"arxiv-2409.08669","DOIUrl":"https://doi.org/arxiv-2409.08669","url":null,"abstract":"3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has\u0000achieved high-quality reconstruction and real-time rendering of complex scenes.\u0000However, the rasterization pipeline still suffers from unnecessary overhead\u0000resulting from avoidable serial Gaussian culling, and uneven load due to the\u0000distinct number of Gaussian to be rendered across pixels, which hinders wider\u0000promotion and application of 3DGS. In order to accelerate Gaussian splatting,\u0000we propose AdR-Gaussian, which moves part of serial culling in Render stage\u0000into the earlier Preprocess stage to enable parallel culling, employing\u0000adaptive radius to narrow the rendering pixel range for each Gaussian, and\u0000introduces a load balancing method to minimize thread waiting time during the\u0000pixel-parallel rendering. Our contributions are threefold, achieving a\u0000rendering speed of 310% while maintaining equivalent or even better quality\u0000than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile\u0000pairs of low splatting opacity based on an adaptive radius in the\u0000Gaussian-parallel Preprocess stage, which reduces the number of affected tile\u0000through the Gaussian bounding circle, thus reducing unnecessary overhead and\u0000achieving faster rendering speed. Secondly, we further propose early culling\u0000based on axis-aligned bounding box for Gaussian splatting, which achieves a\u0000more significant reduction in ineffective expenses by accurately calculating\u0000the Gaussian size in the 2D directions. Thirdly, we propose a balancing\u0000algorithm for pixel thread load, which compresses the information of heavy-load\u0000pixels to reduce thread waiting time, and enhance information of light-load\u0000pixels to hedge against rendering quality loss. Experiments on three datasets\u0000demonstrate that our algorithm can significantly improve the Gaussian Splatting\u0000rendering speed.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed textit{DualGS}, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.
{"title":"Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos","authors":"Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu","doi":"arxiv-2409.08353","DOIUrl":"https://doi.org/arxiv-2409.08353","url":null,"abstract":"Volumetric video represents a transformative advancement in visual media,\u0000enabling users to freely navigate immersive virtual experiences and narrowing\u0000the gap between digital and real worlds. However, the need for extensive manual\u0000intervention to stabilize mesh sequences and the generation of excessively\u0000large assets in existing workflows impedes broader adoption. In this paper, we\u0000present a novel Gaussian-based approach, dubbed textit{DualGS}, for real-time\u0000and high-fidelity playback of complex human performance with excellent\u0000compression ratios. Our key idea in DualGS is to separately represent motion\u0000and appearance using the corresponding skin and joint Gaussians. Such an\u0000explicit disentanglement can significantly reduce motion redundancy and enhance\u0000temporal coherence. We begin by initializing the DualGS and anchoring skin\u0000Gaussians to joint Gaussians at the first frame. Subsequently, we employ a\u0000coarse-to-fine training strategy for frame-by-frame human performance modeling.\u0000It includes a coarse alignment phase for overall motion prediction as well as a\u0000fine-grained optimization for robust tracking and high-fidelity rendering. To\u0000integrate volumetric video seamlessly into VR environments, we efficiently\u0000compress motion using entropy encoding and appearance using codec compression\u0000coupled with a persistent codebook. Our approach achieves a compression ratio\u0000of up to 120 times, only requiring approximately 350KB of storage per frame. We\u0000demonstrate the efficacy of our representation through photo-realistic,\u0000free-view experiences on VR headsets, enabling users to immersively watch\u0000musicians in performance and feel the rhythm of the notes at the performers'\u0000fingertips.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Novel-view synthesis based on visible light has been extensively studied. In comparison to visible light imaging, thermal infrared imaging offers the advantage of all-weather imaging and strong penetration, providing increased possibilities for reconstruction in nighttime and adverse weather scenarios. However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images. To address these limitations, this paper introduces a physics-induced 3D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins by modeling atmospheric transmission effects and thermal conduction in three-dimensional media using neural networks. Additionally, a temperature consistency constraint is incorporated into the optimization objective to enhance the reconstruction accuracy of thermal infrared images. Furthermore, to validate the effectiveness of our method, the first large-scale benchmark dataset for this field named Thermal Infrared Novel-view Synthesis Dataset (TI-NSD) is created. This dataset comprises 20 authentic thermal infrared video scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios, totaling 6,664 frames of thermal infrared image data. Based on this dataset, this paper experimentally verifies the effectiveness of Thermal3D-GS. The results indicate that our method outperforms the baseline method with a 3.03 dB improvement in PSNR and significantly addresses the issues of floaters and indistinct edge features present in the baseline method. Our dataset and codebase will be released in href{https://github.com/mzzcdf/Thermal3DGS}{textcolor{red}{Thermal3DGS}}.
{"title":"Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis","authors":"Qian Chen, Shihao Shu, Xiangzhi Bai","doi":"arxiv-2409.08042","DOIUrl":"https://doi.org/arxiv-2409.08042","url":null,"abstract":"Novel-view synthesis based on visible light has been extensively studied. In\u0000comparison to visible light imaging, thermal infrared imaging offers the\u0000advantage of all-weather imaging and strong penetration, providing increased\u0000possibilities for reconstruction in nighttime and adverse weather scenarios.\u0000However, thermal infrared imaging is influenced by physical characteristics\u0000such as atmospheric transmission effects and thermal conduction, hindering the\u0000precise reconstruction of intricate details in thermal infrared scenes,\u0000manifesting as issues of floaters and indistinct edge features in synthesized\u0000images. To address these limitations, this paper introduces a physics-induced\u00003D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins by\u0000modeling atmospheric transmission effects and thermal conduction in\u0000three-dimensional media using neural networks. Additionally, a temperature\u0000consistency constraint is incorporated into the optimization objective to\u0000enhance the reconstruction accuracy of thermal infrared images. Furthermore, to\u0000validate the effectiveness of our method, the first large-scale benchmark\u0000dataset for this field named Thermal Infrared Novel-view Synthesis Dataset\u0000(TI-NSD) is created. This dataset comprises 20 authentic thermal infrared video\u0000scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios,\u0000totaling 6,664 frames of thermal infrared image data. Based on this dataset,\u0000this paper experimentally verifies the effectiveness of Thermal3D-GS. The\u0000results indicate that our method outperforms the baseline method with a 3.03 dB\u0000improvement in PSNR and significantly addresses the issues of floaters and\u0000indistinct edge features present in the baseline method. Our dataset and\u0000codebase will be released in\u0000href{https://github.com/mzzcdf/Thermal3DGS}{textcolor{red}{Thermal3DGS}}.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}