MeshFormer：利用三维引导重建模型生成高质量网格

arXiv - CS - Graphics Pub Date : 2024-08-19 DOI:arxiv-2408.10198

Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

{"title":"MeshFormer：利用三维引导重建模型生成高质量网格","authors":"Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su","doi":"arxiv-2408.10198","DOIUrl":null,"url":null,"abstract":"Open-world 3D reconstruction models have recently garnered significant\nattention. However, without sufficient 3D inductive bias, existing methods\ntypically entail expensive training costs and struggle to extract high-quality\n3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction\nmodel that explicitly leverages 3D native structure, input guidance, and\ntraining supervision. Specifically, instead of using a triplane representation,\nwe store features in 3D sparse voxels and combine transformers with 3D\nconvolutions to leverage an explicit 3D structure and projective bias. In\naddition to sparse-view RGB input, we require the network to take input and\ngenerate corresponding normal maps. The input normal maps can be predicted by\n2D diffusion models, significantly aiding in the guidance and refinement of the\ngeometry's learning. Moreover, by combining Signed Distance Function (SDF)\nsupervision with surface rendering, we directly learn to generate high-quality\nmeshes without the need for complex multi-stage training processes. By\nincorporating these explicit 3D biases, MeshFormer can be trained efficiently\nand deliver high-quality textured meshes with fine-grained geometric details.\nIt can also be integrated with 2D diffusion models to enable fast\nsingle-image-to-3D and text-to-3D tasks. Project page:\nhttps://meshformer3d.github.io","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model\",\"authors\":\"Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su\",\"doi\":\"arxiv-2408.10198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open-world 3D reconstruction models have recently garnered significant\\nattention. However, without sufficient 3D inductive bias, existing methods\\ntypically entail expensive training costs and struggle to extract high-quality\\n3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction\\nmodel that explicitly leverages 3D native structure, input guidance, and\\ntraining supervision. Specifically, instead of using a triplane representation,\\nwe store features in 3D sparse voxels and combine transformers with 3D\\nconvolutions to leverage an explicit 3D structure and projective bias. In\\naddition to sparse-view RGB input, we require the network to take input and\\ngenerate corresponding normal maps. The input normal maps can be predicted by\\n2D diffusion models, significantly aiding in the guidance and refinement of the\\ngeometry's learning. Moreover, by combining Signed Distance Function (SDF)\\nsupervision with surface rendering, we directly learn to generate high-quality\\nmeshes without the need for complex multi-stage training processes. By\\nincorporating these explicit 3D biases, MeshFormer can be trained efficiently\\nand deliver high-quality textured meshes with fine-grained geometric details.\\nIt can also be integrated with 2D diffusion models to enable fast\\nsingle-image-to-3D and text-to-3D tasks. Project page:\\nhttps://meshformer3d.github.io\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.10198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

开放世界三维重建模型近来备受关注。然而，如果没有足够的三维归纳偏差，现有方法通常会带来昂贵的训练成本，并且难以提取高质量的三维网格。在这项工作中，我们介绍了一种稀疏视图重建模型--MeshFormer，它明确利用了三维原生结构、输入引导和训练监督。具体来说，我们不使用三平面表示法，而是将特征存储在三维稀疏体素中，并将变换器与三维卷积结合起来，以利用明确的三维结构和投影偏置。除了稀疏视图 RGB 输入外，我们还要求网络接收输入并生成相应的法线图。输入的法线图可以通过二维扩散模型进行预测，从而大大有助于指导和完善几何学习。此外，通过将符号距离函数（SDF）监督与曲面渲染相结合，我们可以直接学习生成高质量的网格，而无需复杂的多阶段训练过程。通过结合这些显式三维偏差，MeshFormer 可以高效地进行训练，并提供具有细粒度几何细节的高质量纹理网格，它还可以与二维扩散模型集成，实现快速的单图像到三维和文本到三维任务。项目页面：https://meshformer3d.github.io

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. Specifically, instead of using a triplane representation, we store features in 3D sparse voxels and combine transformers with 3D convolutions to leverage an explicit 3D structure and projective bias. In addition to sparse-view RGB input, we require the network to take input and generate corresponding normal maps. The input normal maps can be predicted by 2D diffusion models, significantly aiding in the guidance and refinement of the geometry's learning. Moreover, by combining Signed Distance Function (SDF) supervision with surface rendering, we directly learn to generate high-quality meshes without the need for complex multi-stage training processes. By incorporating these explicit 3D biases, MeshFormer can be trained efficiently and deliver high-quality textured meshes with fine-grained geometric details. It can also be integrated with 2D diffusion models to enable fast single-image-to-3D and text-to-3D tasks. Project page: https://meshformer3d.github.io

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Graphics

自引率

0.00%

发文量