{"title":"DiffMat:用于图像引导材料生成的潜在扩散模型","authors":"Liang Yuan , Dingkun Yan , Suguru Saito , Issei Fujishiro","doi":"10.1016/j.visinf.2023.12.001","DOIUrl":null,"url":null,"abstract":"<div><p>Creating realistic materials is essential in the construction of immersive virtual environments. While existing techniques for material capture and conditional generation rely on flash-lit photos, they often produce artifacts when the illumination mismatches the training data. In this study, we introduce DiffMat, a novel diffusion model that integrates the CLIP image encoder and a multi-layer, cross-attention denoising backbone to generate latent materials from images under various illuminations. Using a pre-trained StyleGAN-based material generator, our method converts these latent materials into high-resolution SVBRDF textures, a process that enables a seamless fit into the standard physically based rendering pipeline, reducing the requirements for vast computational resources and expansive datasets. DiffMat surpasses existing generative methods in terms of material quality and variety, and shows adaptability to a broader spectrum of lighting conditions in reference images.</p></div>","PeriodicalId":36903,"journal":{"name":"Visual Informatics","volume":"8 1","pages":"Pages 6-14"},"PeriodicalIF":3.8000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2468502X24000019/pdfft?md5=fb0200304a9b292debbf18a3162d10e8&pid=1-s2.0-S2468502X24000019-main.pdf","citationCount":"0","resultStr":"{\"title\":\"DiffMat: Latent diffusion models for image-guided material generation\",\"authors\":\"Liang Yuan , Dingkun Yan , Suguru Saito , Issei Fujishiro\",\"doi\":\"10.1016/j.visinf.2023.12.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Creating realistic materials is essential in the construction of immersive virtual environments. While existing techniques for material capture and conditional generation rely on flash-lit photos, they often produce artifacts when the illumination mismatches the training data. In this study, we introduce DiffMat, a novel diffusion model that integrates the CLIP image encoder and a multi-layer, cross-attention denoising backbone to generate latent materials from images under various illuminations. Using a pre-trained StyleGAN-based material generator, our method converts these latent materials into high-resolution SVBRDF textures, a process that enables a seamless fit into the standard physically based rendering pipeline, reducing the requirements for vast computational resources and expansive datasets. DiffMat surpasses existing generative methods in terms of material quality and variety, and shows adaptability to a broader spectrum of lighting conditions in reference images.</p></div>\",\"PeriodicalId\":36903,\"journal\":{\"name\":\"Visual Informatics\",\"volume\":\"8 1\",\"pages\":\"Pages 6-14\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2468502X24000019/pdfft?md5=fb0200304a9b292debbf18a3162d10e8&pid=1-s2.0-S2468502X24000019-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Visual Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468502X24000019\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Informatics","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468502X24000019","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
DiffMat: Latent diffusion models for image-guided material generation
Creating realistic materials is essential in the construction of immersive virtual environments. While existing techniques for material capture and conditional generation rely on flash-lit photos, they often produce artifacts when the illumination mismatches the training data. In this study, we introduce DiffMat, a novel diffusion model that integrates the CLIP image encoder and a multi-layer, cross-attention denoising backbone to generate latent materials from images under various illuminations. Using a pre-trained StyleGAN-based material generator, our method converts these latent materials into high-resolution SVBRDF textures, a process that enables a seamless fit into the standard physically based rendering pipeline, reducing the requirements for vast computational resources and expansive datasets. DiffMat surpasses existing generative methods in terms of material quality and variety, and shows adaptability to a broader spectrum of lighting conditions in reference images.