{"title":"通过扩散模型估算像素级 DensePose 的高逼真度合成数据集","authors":"Jiaxiao Wen, Tao Chu, Qiong Liu","doi":"10.1016/j.patcog.2024.111137","DOIUrl":null,"url":null,"abstract":"<div><div>Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111137"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Highly realistic synthetic dataset for pixel-level DensePose estimation via diffusion model\",\"authors\":\"Jiaxiao Wen, Tao Chu, Qiong Liu\",\"doi\":\"10.1016/j.patcog.2024.111137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"159 \",\"pages\":\"Article 111137\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320324008884\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008884","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Highly realistic synthetic dataset for pixel-level DensePose estimation via diffusion model
Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.