FaTNET: Feature-alignment transformer network for human pose transfer

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-04-05 DOI:10.1016/j.patcog.2025.111626
Yu Luo , Chengzhi Yuan , Lin Gao , Weiwei Xu , Xiaosong Yang , Pengjie Wang
{"title":"FaTNET: Feature-alignment transformer network for human pose transfer","authors":"Yu Luo ,&nbsp;Chengzhi Yuan ,&nbsp;Lin Gao ,&nbsp;Weiwei Xu ,&nbsp;Xiaosong Yang ,&nbsp;Pengjie Wang","doi":"10.1016/j.patcog.2025.111626","DOIUrl":null,"url":null,"abstract":"<div><div>Pose-guided person image generation involves converting an image of a person from a source pose to a target pose. This task presents significant challenges due to the extensive variability and occlusion. Existing methods heavily rely on CNN-based architectures, which are constrained by their local receptive fields and often struggle to preserve the details of style and shape. To address this problem, we propose a novel framework for human pose transfer with transformers, which can employ global dependencies and keep local features as well. The proposed framework consists of transformer encoder, feature alignment network and transformer synthetic network, enabling the generation of realistic person images with desired poses. The core idea of our framework is to obtain a novel prior image aligned with the target image through the feature alignment network in the embedded and disentangled feature space, and then synthesize the final fine image through the transformer synthetic network by recurrently warping the result of previous stage with the correlation matrix between aligned features and source images. In contrast to previous convolution and non-local methods, ours can employ the global receptive field and preserve detail features as well. The results of qualitative and quantitative experiments demonstrate the superiority of our model in human pose transfer.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111626"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002869","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Pose-guided person image generation involves converting an image of a person from a source pose to a target pose. This task presents significant challenges due to the extensive variability and occlusion. Existing methods heavily rely on CNN-based architectures, which are constrained by their local receptive fields and often struggle to preserve the details of style and shape. To address this problem, we propose a novel framework for human pose transfer with transformers, which can employ global dependencies and keep local features as well. The proposed framework consists of transformer encoder, feature alignment network and transformer synthetic network, enabling the generation of realistic person images with desired poses. The core idea of our framework is to obtain a novel prior image aligned with the target image through the feature alignment network in the embedded and disentangled feature space, and then synthesize the final fine image through the transformer synthetic network by recurrently warping the result of previous stage with the correlation matrix between aligned features and source images. In contrast to previous convolution and non-local methods, ours can employ the global receptive field and preserve detail features as well. The results of qualitative and quantitative experiments demonstrate the superiority of our model in human pose transfer.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FaTNET:用于人体姿势转移的特征对齐变压器网络
姿势引导的人物图像生成涉及将人物图像从源姿势转换为目标姿势。由于广泛的可变性和遮挡,这项任务提出了重大的挑战。现有的方法严重依赖于基于cnn的架构,这些架构受到其局部接受域的限制,并且经常难以保留风格和形状的细节。为了解决这一问题,我们提出了一种新的人体姿态转换框架,该框架可以利用全局依赖关系并保持局部特征。该框架由变压器编码器、特征对齐网络和变压器合成网络组成,能够生成具有所需姿态的逼真人物图像。该框架的核心思想是在嵌入和解纠缠的特征空间中,通过特征对齐网络获得与目标图像对齐的新的先验图像,然后利用对齐后的特征与源图像之间的相关矩阵,对前一阶段的结果进行循环扭曲,通过变压器合成网络合成最终的精细图像。与以前的卷积和非局部方法相比,我们的方法可以利用全局接受场并保留细节特征。定性和定量实验结果证明了该模型在人体姿态转移中的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
期刊最新文献
Discussion on “Interpretable medical deep framework by logits-constraint attention guiding graph-based multi-scale fusion for Alzheimer’s disease analysis” by J. Xu, C. Yuan, X. Ma, H. Shang, X. Shi & X. Zhu. (Pattern Recognition, vol. 152, 2024”) 3D temporal-spatial convolutional LSTM network for assessing drug addiction treatment Pairwise joint symmetric uncertainty based on macro-neighborhood entropy for heterogeneous feature selection Low-rank fused modality assisted magnetic resonance imaging reconstruction via an anatomical variation adaptive transformer LCF3D: A robust and real-time late-cascade fusion framework for 3D object detection in autonomous driving
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1