{"title":"几何图像扩散:基于图像的表面表示:快速、数据高效的文本到三维技术","authors":"Slava Elizarov, Ciara Rowles, Simon Donné","doi":"arxiv-2409.03718","DOIUrl":null,"url":null,"abstract":"Generating high-quality 3D objects from textual descriptions remains a\nchallenging problem due to computational cost, the scarcity of 3D data, and\ncomplex 3D representations. We introduce Geometry Image Diffusion\n(GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to\nefficiently represent 3D shapes using 2D images, thereby avoiding the need for\ncomplex 3D-aware architectures. By integrating a Collaborative Control\nmechanism, we exploit the rich 2D priors of existing Text-to-Image models such\nas Stable Diffusion. This enables strong generalization even with limited 3D\ntraining data (allowing us to use only high-quality training data) as well as\nretaining compatibility with guidance techniques such as IPAdapter. In short,\nGIMDiffusion enables the generation of 3D assets at speeds comparable to\ncurrent Text-to-Image models. The generated objects consist of semantically\nmeaningful, separate parts and include internal structures, enhancing both\nusability and versatility.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation\",\"authors\":\"Slava Elizarov, Ciara Rowles, Simon Donné\",\"doi\":\"arxiv-2409.03718\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generating high-quality 3D objects from textual descriptions remains a\\nchallenging problem due to computational cost, the scarcity of 3D data, and\\ncomplex 3D representations. We introduce Geometry Image Diffusion\\n(GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to\\nefficiently represent 3D shapes using 2D images, thereby avoiding the need for\\ncomplex 3D-aware architectures. By integrating a Collaborative Control\\nmechanism, we exploit the rich 2D priors of existing Text-to-Image models such\\nas Stable Diffusion. This enables strong generalization even with limited 3D\\ntraining data (allowing us to use only high-quality training data) as well as\\nretaining compatibility with guidance techniques such as IPAdapter. In short,\\nGIMDiffusion enables the generation of 3D assets at speeds comparable to\\ncurrent Text-to-Image models. The generated objects consist of semantically\\nmeaningful, separate parts and include internal structures, enhancing both\\nusability and versatility.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.03718\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.03718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Generating high-quality 3D objects from textual descriptions remains a
challenging problem due to computational cost, the scarcity of 3D data, and
complex 3D representations. We introduce Geometry Image Diffusion
(GIMDiffusion), a novel Text-to-3D model that utilizes geometry images to
efficiently represent 3D shapes using 2D images, thereby avoiding the need for
complex 3D-aware architectures. By integrating a Collaborative Control
mechanism, we exploit the rich 2D priors of existing Text-to-Image models such
as Stable Diffusion. This enables strong generalization even with limited 3D
training data (allowing us to use only high-quality training data) as well as
retaining compatibility with guidance techniques such as IPAdapter. In short,
GIMDiffusion enables the generation of 3D assets at speeds comparable to
current Text-to-Image models. The generated objects consist of semantically
meaningful, separate parts and include internal structures, enhancing both
usability and versatility.