MeshFormer:利用三维引导重建模型生成高质量网格

Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su
{"title":"MeshFormer:利用三维引导重建模型生成高质量网格","authors":"Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su","doi":"arxiv-2408.10198","DOIUrl":null,"url":null,"abstract":"Open-world 3D reconstruction models have recently garnered significant\nattention. However, without sufficient 3D inductive bias, existing methods\ntypically entail expensive training costs and struggle to extract high-quality\n3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction\nmodel that explicitly leverages 3D native structure, input guidance, and\ntraining supervision. Specifically, instead of using a triplane representation,\nwe store features in 3D sparse voxels and combine transformers with 3D\nconvolutions to leverage an explicit 3D structure and projective bias. In\naddition to sparse-view RGB input, we require the network to take input and\ngenerate corresponding normal maps. The input normal maps can be predicted by\n2D diffusion models, significantly aiding in the guidance and refinement of the\ngeometry's learning. Moreover, by combining Signed Distance Function (SDF)\nsupervision with surface rendering, we directly learn to generate high-quality\nmeshes without the need for complex multi-stage training processes. By\nincorporating these explicit 3D biases, MeshFormer can be trained efficiently\nand deliver high-quality textured meshes with fine-grained geometric details.\nIt can also be integrated with 2D diffusion models to enable fast\nsingle-image-to-3D and text-to-3D tasks. Project page:\nhttps://meshformer3d.github.io","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model\",\"authors\":\"Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su\",\"doi\":\"arxiv-2408.10198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open-world 3D reconstruction models have recently garnered significant\\nattention. However, without sufficient 3D inductive bias, existing methods\\ntypically entail expensive training costs and struggle to extract high-quality\\n3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction\\nmodel that explicitly leverages 3D native structure, input guidance, and\\ntraining supervision. Specifically, instead of using a triplane representation,\\nwe store features in 3D sparse voxels and combine transformers with 3D\\nconvolutions to leverage an explicit 3D structure and projective bias. In\\naddition to sparse-view RGB input, we require the network to take input and\\ngenerate corresponding normal maps. The input normal maps can be predicted by\\n2D diffusion models, significantly aiding in the guidance and refinement of the\\ngeometry's learning. Moreover, by combining Signed Distance Function (SDF)\\nsupervision with surface rendering, we directly learn to generate high-quality\\nmeshes without the need for complex multi-stage training processes. By\\nincorporating these explicit 3D biases, MeshFormer can be trained efficiently\\nand deliver high-quality textured meshes with fine-grained geometric details.\\nIt can also be integrated with 2D diffusion models to enable fast\\nsingle-image-to-3D and text-to-3D tasks. Project page:\\nhttps://meshformer3d.github.io\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.10198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

开放世界三维重建模型近来备受关注。然而,如果没有足够的三维归纳偏差,现有方法通常会带来昂贵的训练成本,并且难以提取高质量的三维网格。在这项工作中,我们介绍了一种稀疏视图重建模型--MeshFormer,它明确利用了三维原生结构、输入引导和训练监督。具体来说,我们不使用三平面表示法,而是将特征存储在三维稀疏体素中,并将变换器与三维卷积结合起来,以利用明确的三维结构和投影偏置。除了稀疏视图 RGB 输入外,我们还要求网络接收输入并生成相应的法线图。输入的法线图可以通过二维扩散模型进行预测,从而大大有助于指导和完善几何学习。此外,通过将符号距离函数(SDF)监督与曲面渲染相结合,我们可以直接学习生成高质量的网格,而无需复杂的多阶段训练过程。通过结合这些显式三维偏差,MeshFormer 可以高效地进行训练,并提供具有细粒度几何细节的高质量纹理网格,它还可以与二维扩散模型集成,实现快速的单图像到三维和文本到三维任务。项目页面:https://meshformer3d.github.io
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model
Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. Specifically, instead of using a triplane representation, we store features in 3D sparse voxels and combine transformers with 3D convolutions to leverage an explicit 3D structure and projective bias. In addition to sparse-view RGB input, we require the network to take input and generate corresponding normal maps. The input normal maps can be predicted by 2D diffusion models, significantly aiding in the guidance and refinement of the geometry's learning. Moreover, by combining Signed Distance Function (SDF) supervision with surface rendering, we directly learn to generate high-quality meshes without the need for complex multi-stage training processes. By incorporating these explicit 3D biases, MeshFormer can be trained efficiently and deliver high-quality textured meshes with fine-grained geometric details. It can also be integrated with 2D diffusion models to enable fast single-image-to-3D and text-to-3D tasks. Project page: https://meshformer3d.github.io
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations A Missing Data Imputation GAN for Character Sprite Generation Visualizing Temporal Topic Embeddings with a Compass Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1