MVGaussian：利用多视图引导和表面致密化技术生成高保真文本到三维内容

arXiv - CS - Graphics Pub Date : 2024-09-10 DOI:arxiv-2409.06620

Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera

{"title":"MVGaussian：利用多视图引导和表面致密化技术生成高保真文本到三维内容","authors":"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera","doi":"arxiv-2409.06620","DOIUrl":null,"url":null,"abstract":"The field of text-to-3D content generation has made significant progress in\ngenerating realistic 3D objects, with existing methodologies like Score\nDistillation Sampling (SDS) offering promising guidance. However, these methods\noften encounter the \"Janus\" problem-multi-face ambiguities due to imprecise\nguidance. Additionally, while recent advancements in 3D gaussian splitting have\nshown its efficacy in representing 3D volumes, optimization of this\nrepresentation remains largely unexplored. This paper introduces a unified\nframework for text-to-3D content generation that addresses these critical gaps.\nOur approach utilizes multi-view guidance to iteratively form the structure of\nthe 3D model, progressively enhancing detail and accuracy. We also introduce a\nnovel densification algorithm that aligns gaussians close to the surface,\noptimizing the structural integrity and fidelity of the generated models.\nExtensive experiments validate our approach, demonstrating that it produces\nhigh-quality visual outputs with minimal time cost. Notably, our method\nachieves high-quality results within half an hour of training, offering a\nsubstantial efficiency gain over most existing methods, which require hours of\ntraining time to achieve comparable results.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"40 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification\",\"authors\":\"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera\",\"doi\":\"arxiv-2409.06620\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The field of text-to-3D content generation has made significant progress in\\ngenerating realistic 3D objects, with existing methodologies like Score\\nDistillation Sampling (SDS) offering promising guidance. However, these methods\\noften encounter the \\\"Janus\\\" problem-multi-face ambiguities due to imprecise\\nguidance. Additionally, while recent advancements in 3D gaussian splitting have\\nshown its efficacy in representing 3D volumes, optimization of this\\nrepresentation remains largely unexplored. This paper introduces a unified\\nframework for text-to-3D content generation that addresses these critical gaps.\\nOur approach utilizes multi-view guidance to iteratively form the structure of\\nthe 3D model, progressively enhancing detail and accuracy. We also introduce a\\nnovel densification algorithm that aligns gaussians close to the surface,\\noptimizing the structural integrity and fidelity of the generated models.\\nExtensive experiments validate our approach, demonstrating that it produces\\nhigh-quality visual outputs with minimal time cost. Notably, our method\\nachieves high-quality results within half an hour of training, offering a\\nsubstantial efficiency gain over most existing methods, which require hours of\\ntraining time to achieve comparable results.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"40 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.06620\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本到三维内容生成领域在生成逼真的三维对象方面取得了重大进展，现有的方法（如分数蒸馏采样（SDS））提供了很好的指导。然而，这些方法经常会遇到 "Janus "问题--由于引导不精确而导致多面性模糊。此外，虽然三维高斯分割的最新进展显示了其在表示三维体积方面的功效，但这种表示方法的优化在很大程度上仍未得到探索。我们的方法利用多视角引导迭代形成三维模型结构，逐步增强细节和准确性。我们的方法利用多视角引导逐步形成三维模型的结构，逐步增强细节和准确性。我们还引入了一种高级致密化算法，可将高斯对齐到表面附近，优化生成模型的结构完整性和保真度。值得注意的是，我们的方法在半小时的训练时间内就能获得高质量的结果，与大多数现有方法相比，效率大幅提高，因为现有方法需要数小时的训练时间才能获得类似的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Graphics

自引率

0.00%

发文量