Quanwei Yang , Lingyun Yu , Fengyuan Liu , Yun Song , Meng Shao , Guoqing Jin , Hongtao Xie
{"title":"Symmetrical Siamese Network for pose-guided person synthesis","authors":"Quanwei Yang , Lingyun Yu , Fengyuan Liu , Yun Song , Meng Shao , Guoqing Jin , Hongtao Xie","doi":"10.1016/j.cviu.2024.104134","DOIUrl":null,"url":null,"abstract":"<div><p>Pose-Guided Person Image Synthesis (PGPIS) aims to generate a realistic person image that preserves the appearance of the source person while adopting the target pose. Various appearances and drastic pose changes make this task highly challenging. Due to the insufficient utilization of paired data, existing models face difficulties in accurately preserving the source appearance details and high-frequency textures in the generated images. Meanwhile, although current popular AdaIN-based methods are advantageous in handling drastic pose changes, they struggle to capture diverse clothing shapes imposed by the limitation of global feature statistics. To address these issues, we propose a novel Symmetrical Siamese Network (SSNet) for PGPIS, which consists of two synergistic symmetrical generative branches that leverage prior knowledge of paired data to comprehensively exploit appearance details. For feature integration, we propose a Style Matching Module (SMM) to transfer multi-level region appearance styles and gradient information to the desired pose for enriching the high-frequency textures. Furthermore, to overcome the limitation of global feature statistics, a Spatial Attention Module (SAM) is introduced to complement the SMM for capturing clothing shapes. Extensive experiments show the effectiveness of our SSNet, achieving state-of-the-art results on public datasets. Moreover, our SSNet can also edit the source appearance attributes, making it versatile in wider application scenarios.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"248 ","pages":"Article 104134"},"PeriodicalIF":4.3000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002157","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Pose-Guided Person Image Synthesis (PGPIS) aims to generate a realistic person image that preserves the appearance of the source person while adopting the target pose. Various appearances and drastic pose changes make this task highly challenging. Due to the insufficient utilization of paired data, existing models face difficulties in accurately preserving the source appearance details and high-frequency textures in the generated images. Meanwhile, although current popular AdaIN-based methods are advantageous in handling drastic pose changes, they struggle to capture diverse clothing shapes imposed by the limitation of global feature statistics. To address these issues, we propose a novel Symmetrical Siamese Network (SSNet) for PGPIS, which consists of two synergistic symmetrical generative branches that leverage prior knowledge of paired data to comprehensively exploit appearance details. For feature integration, we propose a Style Matching Module (SMM) to transfer multi-level region appearance styles and gradient information to the desired pose for enriching the high-frequency textures. Furthermore, to overcome the limitation of global feature statistics, a Spatial Attention Module (SAM) is introduced to complement the SMM for capturing clothing shapes. Extensive experiments show the effectiveness of our SSNet, achieving state-of-the-art results on public datasets. Moreover, our SSNet can also edit the source appearance attributes, making it versatile in wider application scenarios.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems