SCAM! Transferring humans between images with Semantic Cross Attention Modulation

Nicolas Dufour, David Picard, Vicky S. Kalogeiton
{"title":"SCAM! Transferring humans between images with Semantic Cross Attention Modulation","authors":"Nicolas Dufour, David Picard, Vicky S. Kalogeiton","doi":"10.48550/arXiv.2210.04883","DOIUrl":null,"url":null,"abstract":"A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.04883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
骗局!语义交叉注意调制在图像间转移人
最近大量的工作都是针对语义条件下的图像生成。这些方法大多集中在较窄的姿势转移任务上,而忽略了更具有挑战性的主题转移任务,即不仅要转移姿势,还要转移外观和背景。在这项工作中,我们引入了SCAM(语义交叉注意调制),这是一个系统,它在图像的每个语义区域(包括前景和背景)编码丰富多样的信息,从而实现精确生成,并强调细节。这是通过语义注意转换器编码器实现的,该编码器为每个语义区域提取多个潜在向量,相应的生成器通过使用语义交叉注意调制来利用这些多个潜在向量。它只使用重建设置进行训练,而受试者转移在测试时进行。我们的分析表明,我们提出的架构在编码每个语义区域的外观多样性方面是成功的。在idedesigner和CelebAMask-HD数据集上进行的大量实验表明,SCAM优于SEAN和SPADE;此外,它还开创了主体转移研究的新局面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding Rethinking Confidence Calibration for Failure Prediction PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry Diverse Human Motion Prediction Guided by Multi-level Spatial-Temporal Anchors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1