{"title":"CLIPTexture: Text-Driven Texture Synthesis","authors":"Yiren Song","doi":"10.1145/3503161.3548146","DOIUrl":null,"url":null,"abstract":"Can artificial intelligence create textures with artistic value according to human language control? Existing texture synthesis methods require example texture input. However, in many practical situations, users don't have satisfying textures but tell designers about their needs through simple sketches and verbal descriptions. This paper proposes a novel texture synthesis framework based on the CLIP, which models the texture synthesis problem as an optimization process and realizes text-driven texture synthesis by minimizing the distance between the input image and the text prompt in latent space. Our method performs zero-shot image manipulation successfully even between unseen domains. We implement texture synthesis using two different optimization methods, the TextureNet and Diffvg, demonstrating the generality of CLIPTexture. Extensive experiments confirmed the robust and superior manipulation performance of our methods compared to the existing baselines.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Can artificial intelligence create textures with artistic value according to human language control? Existing texture synthesis methods require example texture input. However, in many practical situations, users don't have satisfying textures but tell designers about their needs through simple sketches and verbal descriptions. This paper proposes a novel texture synthesis framework based on the CLIP, which models the texture synthesis problem as an optimization process and realizes text-driven texture synthesis by minimizing the distance between the input image and the text prompt in latent space. Our method performs zero-shot image manipulation successfully even between unseen domains. We implement texture synthesis using two different optimization methods, the TextureNet and Diffvg, demonstrating the generality of CLIPTexture. Extensive experiments confirmed the robust and superior manipulation performance of our methods compared to the existing baselines.