VI3DRM：通过逼真的新颖视图合成从稀疏视图实现细致的三维重建

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI:arxiv-2409.08207

Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao

{"title":"VI3DRM：通过逼真的新颖视图合成从稀疏视图实现细致的三维重建","authors":"Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao","doi":"arxiv-2409.08207","DOIUrl":null,"url":null,"abstract":"Recently, methods like Zero-1-2-3 have focused on single-view based 3D\nreconstruction and have achieved remarkable success. However, their predictions\nfor unseen areas heavily rely on the inductive bias of large-scale pretrained\ndiffusion models. Although subsequent work, such as DreamComposer, attempts to\nmake predictions more controllable by incorporating additional views, the\nresults remain unrealistic due to feature entanglement in the vanilla latent\nspace, including factors such as lighting, material, and structure. To address\nthese issues, we introduce the Visual Isotropy 3D Reconstruction Model\n(VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates\nwithin an ID consistent and perspective-disentangled 3D latent space. By\nfacilitating the disentanglement of semantic information, color, material\nproperties and lighting, VI3DRM is capable of generating highly realistic\nimages that are indistinguishable from real photographs. By leveraging both\nreal and synthesized images, our approach enables the accurate construction of\npointmaps, ultimately producing finely textured meshes or point clouds. On the\nNVS task, tested on the GSO dataset, VI3DRM significantly outperforms\nstate-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of\n0.929, and an LPIPS of 0.027. Code will be made available upon publication.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis\",\"authors\":\"Hao Chen, Jiafu Wu, Ying Jin, Jinlong Peng, Xiaofeng Mao, Mingmin Chi, Mufeng Yao, Bo Peng, Jian Li, Yun Cao\",\"doi\":\"arxiv-2409.08207\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, methods like Zero-1-2-3 have focused on single-view based 3D\\nreconstruction and have achieved remarkable success. However, their predictions\\nfor unseen areas heavily rely on the inductive bias of large-scale pretrained\\ndiffusion models. Although subsequent work, such as DreamComposer, attempts to\\nmake predictions more controllable by incorporating additional views, the\\nresults remain unrealistic due to feature entanglement in the vanilla latent\\nspace, including factors such as lighting, material, and structure. To address\\nthese issues, we introduce the Visual Isotropy 3D Reconstruction Model\\n(VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates\\nwithin an ID consistent and perspective-disentangled 3D latent space. By\\nfacilitating the disentanglement of semantic information, color, material\\nproperties and lighting, VI3DRM is capable of generating highly realistic\\nimages that are indistinguishable from real photographs. By leveraging both\\nreal and synthesized images, our approach enables the accurate construction of\\npointmaps, ultimately producing finely textured meshes or point clouds. On the\\nNVS task, tested on the GSO dataset, VI3DRM significantly outperforms\\nstate-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of\\n0.929, and an LPIPS of 0.027. Code will be made available upon publication.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08207\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，Zero-1-2-3 等方法专注于基于单视角的 3D 重建，并取得了显著的成功。然而，它们对未知区域的预测严重依赖于大规模预训练扩散模型的归纳偏差。尽管后来的工作（如 DreamComposer）试图通过加入额外视图来提高预测的可控性，但由于虚潜在空间中的特征纠缠（包括照明、材料和结构等因素），结果仍然不切实际。为了解决这些问题，我们引入了视觉各向同性三维重建模型（Visual Isotropy 3D Reconstruction Model，VI3DRM），这是一种基于扩散的稀疏视图三维重建模型，在 ID 一致且透视解散的三维潜空间中运行。通过促进语义信息、颜色、材料属性和光照的分离，VI3DRM 能够生成与真实照片无异的高度逼真的图像。通过同时利用真实图像和合成图像，我们的方法能够准确构建点阵图，最终生成纹理精细的网格或点云。在GSO数据集上测试的NVS任务中，VI3DRM明显优于最先进的DreamComposer方法，PSNR达到38.61，SSIM达到0.929，LPIPS达到0.027。代码将在发表后公布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

VI3DRM:Towards meticulous 3D Reconstruction from Sparse Views via Photo-Realistic Novel View Synthesis

Recently, methods like Zero-1-2-3 have focused on single-view based 3D reconstruction and have achieved remarkable success. However, their predictions for unseen areas heavily rely on the inductive bias of large-scale pretrained diffusion models. Although subsequent work, such as DreamComposer, attempts to make predictions more controllable by incorporating additional views, the results remain unrealistic due to feature entanglement in the vanilla latent space, including factors such as lighting, material, and structure. To address these issues, we introduce the Visual Isotropy 3D Reconstruction Model (VI3DRM), a diffusion-based sparse views 3D reconstruction model that operates within an ID consistent and perspective-disentangled 3D latent space. By facilitating the disentanglement of semantic information, color, material properties and lighting, VI3DRM is capable of generating highly realistic images that are indistinguishable from real photographs. By leveraging both real and synthesized images, our approach enables the accurate construction of pointmaps, ultimately producing finely textured meshes or point clouds. On the NVS task, tested on the GSO dataset, VI3DRM significantly outperforms state-of-the-art method DreamComposer, achieving a PSNR of 38.61, an SSIM of 0.929, and an LPIPS of 0.027. Code will be made available upon publication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Massively Multi-Person 3D Human Motion Forecasting with Scene Context Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Precise Forecasting of Sky Images Using Spatial Warping JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation Applications of Knowledge Distillation in Remote Sensing: A Survey