High Fidelity Makeup via 2D and 3D Identity Preservation Net

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-04-08 DOI:10.1145/3656475

Jinliang Liu, Zhedong Zheng, Zongxin Yang, Yi Yang

{"title":"High Fidelity Makeup via 2D and 3D Identity Preservation Net","authors":"Jinliang Liu, Zhedong Zheng, Zongxin Yang, Yi Yang","doi":"10.1145/3656475","DOIUrl":null,"url":null,"abstract":"<p>In this paper, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown promising results in aligning facial parts and transferring makeup textures. However, they often neglect the facial geometry of the source image, leading to two adverse effects: (1) alterations in geometrically relevant facial features, causing face flattening and loss of personality, and (2) difficulties in maintaining background consistency, as networks cannot clearly determine the face-background boundary. To jointly tackle these issues, we propose the High Fidelity Makeup via 2D and 3D Identity Preservation Network (IP23-Net), a novel framework that leverages facial geometry information to generate more realistic results. Our method comprises a 3D Shape Identity Encoder, which extracts identity and 3D shape features. We incorporate a 3D face reconstruction model to ensure the three-dimensional effect of face makeup, thereby preserving the characters’ depth and natural appearance. To preserve background consistency, our Background Correction Decoder automatically predicts an adaptive mask for the source image, distinguishing the foreground and background. In addition to popular benchmarks, we introduce a new large-scale High Resolution Synthetic Makeup Dataset containing 335,230 diverse high-resolution face images, to evaluate our method’s generalization ability. Experiments demonstrate that IP23-Net achieves high-fidelity makeup transfer while effectively preserving background consistency. The code will be made publicly available.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"56 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3656475","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we address the challenging makeup transfer task, aiming to transfer makeup from a reference image to a source image while preserving facial geometry and background consistency. Existing deep neural network-based methods have shown promising results in aligning facial parts and transferring makeup textures. However, they often neglect the facial geometry of the source image, leading to two adverse effects: (1) alterations in geometrically relevant facial features, causing face flattening and loss of personality, and (2) difficulties in maintaining background consistency, as networks cannot clearly determine the face-background boundary. To jointly tackle these issues, we propose the High Fidelity Makeup via 2D and 3D Identity Preservation Network (IP23-Net), a novel framework that leverages facial geometry information to generate more realistic results. Our method comprises a 3D Shape Identity Encoder, which extracts identity and 3D shape features. We incorporate a 3D face reconstruction model to ensure the three-dimensional effect of face makeup, thereby preserving the characters’ depth and natural appearance. To preserve background consistency, our Background Correction Decoder automatically predicts an adaptive mask for the source image, distinguishing the foreground and background. In addition to popular benchmarks, we introduce a new large-scale High Resolution Synthetic Makeup Dataset containing 335,230 diverse high-resolution face images, to evaluate our method’s generalization ability. Experiments demonstrate that IP23-Net achieves high-fidelity makeup transfer while effectively preserving background consistency. The code will be made publicly available.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过二维和三维身份保护网进行高保真化妆

本文探讨了具有挑战性的化妆转移任务，旨在将化妆从参考图像转移到源图像，同时保持面部几何和背景的一致性。现有的基于深度神经网络的方法在对齐面部部件和转移妆容纹理方面取得了可喜的成果。然而，这些方法往往忽略了源图像的面部几何特征，从而导致两种不良后果：(1) 改变与几何特征相关的面部特征，造成面部扁平和个性缺失；(2) 由于网络无法明确确定面部-背景边界，因此难以保持背景一致性。为了共同解决这些问题，我们提出了 "通过二维和三维身份保护网络实现高保真化妆"（IP23-Net），这是一个利用面部几何信息生成更逼真效果的新颖框架。我们的方法包括一个三维形状身份编码器，用于提取身份和三维形状特征。我们结合了三维面部重建模型，以确保面部化妆的三维效果，从而保留人物的深度和自然外观。为了保持背景的一致性，我们的背景校正解码器会自动预测源图像的自适应遮罩，从而区分前景和背景。除了常用的基准数据外，我们还引入了一个新的大规模高分辨率合成化妆数据集，其中包含 335230 张不同的高分辨率人脸图像，以评估我们方法的泛化能力。实验证明，IP23-Net 在有效保持背景一致性的同时，实现了高保真化妆转移。代码将公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.