L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2021-06-01 DOI:10.1109/CVPR46437.2021.00297

Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, T. Xiang

{"title":"L2M-GAN: Learning to Manipulate Latent Space Semantics for Facial Attribute Editing","authors":"Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, T. Xiang","doi":"10.1109/CVPR46437.2021.00297","DOIUrl":null,"url":null,"abstract":"A deep facial attribute editing model strives to meet two requirements: (1) attribute correctness – the target attribute should correctly appear on the edited face image; (2) irrelevance preservation – any irrelevant information (e.g., identity) should not be changed after editing. Meeting both requirements challenges the state-of-the-art works which resort to either spatial attention or latent space factorization. Specifically, the former assume that each attribute has well-defined local support regions; they are often more effective for editing a local attribute than a global one. The latter factorize the latent space of a fixed pretrained GAN into different attribute-relevant parts, but they cannot be trained end-to-end with the GAN, leading to sub-optimal solutions. To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes. The key novel components are: (1) A latent space vector of the GAN is factorized into an attribute-relevant and irrelevant codes with an orthogonality constraint imposed to ensure disentanglement. (2) An attribute-relevant code transformer is learned to manipulate the attribute value; crucially, the transformed code are subject to the same orthogonality constraint. By forcing both the original attribute-relevant latent code and the edited code to be disentangled from any attribute-irrelevant code, our model strikes the perfect balance between attribute correctness and irrelevance preservation. Extensive experiments on CelebA-HQ show that our L2M-GAN achieves significant improvements over the state-of-the-arts.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR46437.2021.00297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

A deep facial attribute editing model strives to meet two requirements: (1) attribute correctness – the target attribute should correctly appear on the edited face image; (2) irrelevance preservation – any irrelevant information (e.g., identity) should not be changed after editing. Meeting both requirements challenges the state-of-the-art works which resort to either spatial attention or latent space factorization. Specifically, the former assume that each attribute has well-defined local support regions; they are often more effective for editing a local attribute than a global one. The latter factorize the latent space of a fixed pretrained GAN into different attribute-relevant parts, but they cannot be trained end-to-end with the GAN, leading to sub-optimal solutions. To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes. The key novel components are: (1) A latent space vector of the GAN is factorized into an attribute-relevant and irrelevant codes with an orthogonality constraint imposed to ensure disentanglement. (2) An attribute-relevant code transformer is learned to manipulate the attribute value; crucially, the transformed code are subject to the same orthogonality constraint. By forcing both the original attribute-relevant latent code and the edited code to be disentangled from any attribute-irrelevant code, our model strikes the perfect balance between attribute correctness and irrelevance preservation. Extensive experiments on CelebA-HQ show that our L2M-GAN achieves significant improvements over the state-of-the-arts.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

L2M-GAN:用于人脸属性编辑的潜在空间语义操纵学习

深度人脸属性编辑模型力求满足两个要求:(1)属性正确性——目标属性应正确出现在编辑后的人脸图像上;(2)不相关保存——任何不相关的信息(如身份)在编辑后不应被更改。满足这两种要求对最先进的作品提出了挑战，这些作品要么诉诸空间注意力，要么诉诸潜在的空间分解。具体来说，前者假设每个属性都有定义良好的局部支持区域;对于编辑局部属性，它们通常比编辑全局属性更有效。后者将固定的预训练GAN的潜在空间分解为不同的属性相关部分，但它们不能端到端与GAN一起训练，导致次优解。为了克服这些限制，我们提出了一种新的潜在空间分解模型，称为L2M-GAN，它是端到端学习的，可以有效地编辑局部和全局属性。关键的新组件是:(1)将GAN的潜在空间向量分解为属性相关和不相关的代码，并施加正交性约束以确保解纠缠。(2)学习与属性相关的代码转换器来操作属性值;至关重要的是，转换后的代码受到相同的正交性约束。通过强制将原始属性相关的潜在代码和编辑后的代码从任何属性无关的代码中分离出来，我们的模型在属性正确性和不相关性保存之间取得了完美的平衡。在CelebA-HQ上的大量实验表明，我们的L2M-GAN比最先进的技术有了显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

Multi-Label Learning from Single Positive Labels Panoramic Image Reflection Removal Self-Aligned Video Deraining with Transmission-Depth Consistency PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors Ultra-High-Definition Image Dehazing via Multi-Guided Bilateral Learning