骗局!语义交叉注意调制在图像间转移人

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-10-10 DOI:10.48550/arXiv.2210.04883

Nicolas Dufour, David Picard, Vicky S. Kalogeiton

{"title":"骗局!语义交叉注意调制在图像间转移人","authors":"Nicolas Dufour, David Picard, Vicky S. Kalogeiton","doi":"10.48550/arXiv.2210.04883","DOIUrl":null,"url":null,"abstract":"A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"1 1","pages":"713-729"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"SCAM! Transferring humans between images with Semantic Cross Attention Modulation\",\"authors\":\"Nicolas Dufour, David Picard, Vicky S. Kalogeiton\",\"doi\":\"10.48550/arXiv.2210.04883\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.\",\"PeriodicalId\":72676,\"journal\":{\"name\":\"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision\",\"volume\":\"1 1\",\"pages\":\"713-729\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.04883\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.04883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

最近大量的工作都是针对语义条件下的图像生成。这些方法大多集中在较窄的姿势转移任务上，而忽略了更具有挑战性的主题转移任务，即不仅要转移姿势，还要转移外观和背景。在这项工作中，我们引入了SCAM(语义交叉注意调制)，这是一个系统，它在图像的每个语义区域(包括前景和背景)编码丰富多样的信息，从而实现精确生成，并强调细节。这是通过语义注意转换器编码器实现的，该编码器为每个语义区域提取多个潜在向量，相应的生成器通过使用语义交叉注意调制来利用这些多个潜在向量。它只使用重建设置进行训练，而受试者转移在测试时进行。我们的分析表明，我们提出的架构在编码每个语义区域的外观多样性方面是成功的。在idedesigner和CelebAMask-HD数据集上进行的大量实验表明，SCAM优于SEAN和SPADE;此外，它还开创了主体转移研究的新局面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SCAM! Transferring humans between images with Semantic Cross Attention Modulation

A large body of recent work targets semantically conditioned image generation. Most such methods focus on the narrower task of pose transfer and ignore the more challenging task of subject transfer that consists in not only transferring the pose but also the appearance and background. In this work, we introduce SCAM (Semantic Cross Attention Modulation), a system that encodes rich and diverse information in each semantic region of the image (including foreground and background), thus achieving precise generation with emphasis on fine details. This is enabled by the Semantic Attention Transformer Encoder that extracts multiple latent vectors for each semantic region, and the corresponding generator that exploits these multiple latents by using semantic cross attention modulation. It is trained only using a reconstruction setup, while subject transfer is performed at test time. Our analysis shows that our proposed architecture is successful at encoding the diversity of appearance in each semantic region. Extensive experiments on the iDesigner and CelebAMask-HD datasets show that SCAM outperforms SEAN and SPADE; moreover, it sets the new state of the art on subject transfer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量