{"title":"OctFusion:基于八维扩散模型的三维形状生成","authors":"Bojun Xiong, Si-Tong Wei, Xin-Yang Zheng, Yan-Pei Cao, Zhouhui Lian, Peng-Shuai Wang","doi":"arxiv-2408.14732","DOIUrl":null,"url":null,"abstract":"Diffusion models have emerged as a popular method for 3D generation. However,\nit is still challenging for diffusion models to efficiently generate diverse\nand high-quality 3D shapes. In this paper, we introduce OctFusion, which can\ngenerate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia\n4090 GPU, and the extracted meshes are guaranteed to be continuous and\nmanifold. The key components of OctFusion are the octree-based latent\nrepresentation and the accompanying diffusion models. The representation\ncombines the benefits of both implicit neural representations and explicit\nspatial octrees and is learned with an octree-based variational autoencoder.\nThe proposed diffusion model is a unified multi-scale U-Net that enables\nweights and computation sharing across different octree levels and avoids the\ncomplexity of widely used cascaded diffusion schemes. We verify the\neffectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve\nstate-of-the-art performances on shape generation tasks. We demonstrate that\nOctFusion is extendable and flexible by generating high-quality color fields\nfor textured mesh generation and high-quality 3D shapes conditioned on text\nprompts, sketches, or category labels. Our code and pre-trained models are\navailable at \\url{https://github.com/octree-nn/octfusion}.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"OctFusion: Octree-based Diffusion Models for 3D Shape Generation\",\"authors\":\"Bojun Xiong, Si-Tong Wei, Xin-Yang Zheng, Yan-Pei Cao, Zhouhui Lian, Peng-Shuai Wang\",\"doi\":\"arxiv-2408.14732\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diffusion models have emerged as a popular method for 3D generation. However,\\nit is still challenging for diffusion models to efficiently generate diverse\\nand high-quality 3D shapes. In this paper, we introduce OctFusion, which can\\ngenerate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia\\n4090 GPU, and the extracted meshes are guaranteed to be continuous and\\nmanifold. The key components of OctFusion are the octree-based latent\\nrepresentation and the accompanying diffusion models. The representation\\ncombines the benefits of both implicit neural representations and explicit\\nspatial octrees and is learned with an octree-based variational autoencoder.\\nThe proposed diffusion model is a unified multi-scale U-Net that enables\\nweights and computation sharing across different octree levels and avoids the\\ncomplexity of widely used cascaded diffusion schemes. We verify the\\neffectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve\\nstate-of-the-art performances on shape generation tasks. We demonstrate that\\nOctFusion is extendable and flexible by generating high-quality color fields\\nfor textured mesh generation and high-quality 3D shapes conditioned on text\\nprompts, sketches, or category labels. Our code and pre-trained models are\\navailable at \\\\url{https://github.com/octree-nn/octfusion}.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14732\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
OctFusion: Octree-based Diffusion Models for 3D Shape Generation
Diffusion models have emerged as a popular method for 3D generation. However,
it is still challenging for diffusion models to efficiently generate diverse
and high-quality 3D shapes. In this paper, we introduce OctFusion, which can
generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia
4090 GPU, and the extracted meshes are guaranteed to be continuous and
manifold. The key components of OctFusion are the octree-based latent
representation and the accompanying diffusion models. The representation
combines the benefits of both implicit neural representations and explicit
spatial octrees and is learned with an octree-based variational autoencoder.
The proposed diffusion model is a unified multi-scale U-Net that enables
weights and computation sharing across different octree levels and avoids the
complexity of widely used cascaded diffusion schemes. We verify the
effectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve
state-of-the-art performances on shape generation tasks. We demonstrate that
OctFusion is extendable and flexible by generating high-quality color fields
for textured mesh generation and high-quality 3D shapes conditioned on text
prompts, sketches, or category labels. Our code and pre-trained models are
available at \url{https://github.com/octree-nn/octfusion}.