{"title":"精确区域语义辅助 GAN,用于生成姿态引导的人物图像","authors":"Ji Liu, Zhenyu Weng, Yuesheng Zhu","doi":"10.1049/cit2.12255","DOIUrl":null,"url":null,"abstract":"<p>Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a <b>23%</b> reduction in LPIPS compared to the method without using the semantic maps and a <b>6.9%</b> reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"665-678"},"PeriodicalIF":8.4000,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12255","citationCount":"0","resultStr":"{\"title\":\"Precise region semantics-assisted GAN for pose-guided person image generation\",\"authors\":\"Ji Liu, Zhenyu Weng, Yuesheng Zhu\",\"doi\":\"10.1049/cit2.12255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a <b>23%</b> reduction in LPIPS compared to the method without using the semantic maps and a <b>6.9%</b> reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"9 3\",\"pages\":\"665-678\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2023-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12255\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12255\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12255","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Precise region semantics-assisted GAN for pose-guided person image generation
Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a 23% reduction in LPIPS compared to the method without using the semantic maps and a 6.9% reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.