Yu Luo , Chengzhi Yuan , Lin Gao , Weiwei Xu , Xiaosong Yang , Pengjie Wang
{"title":"FaTNET: Feature-alignment transformer network for human pose transfer","authors":"Yu Luo , Chengzhi Yuan , Lin Gao , Weiwei Xu , Xiaosong Yang , Pengjie Wang","doi":"10.1016/j.patcog.2025.111626","DOIUrl":null,"url":null,"abstract":"<div><div>Pose-guided person image generation involves converting an image of a person from a source pose to a target pose. This task presents significant challenges due to the extensive variability and occlusion. Existing methods heavily rely on CNN-based architectures, which are constrained by their local receptive fields and often struggle to preserve the details of style and shape. To address this problem, we propose a novel framework for human pose transfer with transformers, which can employ global dependencies and keep local features as well. The proposed framework consists of transformer encoder, feature alignment network and transformer synthetic network, enabling the generation of realistic person images with desired poses. The core idea of our framework is to obtain a novel prior image aligned with the target image through the feature alignment network in the embedded and disentangled feature space, and then synthesize the final fine image through the transformer synthetic network by recurrently warping the result of previous stage with the correlation matrix between aligned features and source images. In contrast to previous convolution and non-local methods, ours can employ the global receptive field and preserve detail features as well. The results of qualitative and quantitative experiments demonstrate the superiority of our model in human pose transfer.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111626"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002869","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Pose-guided person image generation involves converting an image of a person from a source pose to a target pose. This task presents significant challenges due to the extensive variability and occlusion. Existing methods heavily rely on CNN-based architectures, which are constrained by their local receptive fields and often struggle to preserve the details of style and shape. To address this problem, we propose a novel framework for human pose transfer with transformers, which can employ global dependencies and keep local features as well. The proposed framework consists of transformer encoder, feature alignment network and transformer synthetic network, enabling the generation of realistic person images with desired poses. The core idea of our framework is to obtain a novel prior image aligned with the target image through the feature alignment network in the embedded and disentangled feature space, and then synthesize the final fine image through the transformer synthetic network by recurrently warping the result of previous stage with the correlation matrix between aligned features and source images. In contrast to previous convolution and non-local methods, ours can employ the global receptive field and preserve detail features as well. The results of qualitative and quantitative experiments demonstrate the superiority of our model in human pose transfer.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.