{"title":"Real-time facial reconstruction and expression replacement based on neural radiation field","authors":"Shenning Zhang , Hui Li , Xuefeng Tian","doi":"10.1016/j.sasc.2025.200185","DOIUrl":null,"url":null,"abstract":"<div><div>It is now possible to do high-fidelity 3D facial reconstruction and unique view synthesis thanks to the recent discovery of Neural Radiance Fields (NeRF), which has established its substantial importance in the field of 3D vision. However, the operational approaches that are now in use require a significant amount of human engagement, such as the need for users to provide semantic masks and the inconvenience of manual attribute searching for non-expert users. Our approach focuses on enabling the manipulation of NeRF-reconstructed faces with just a single text input. A scene manipulator, specifically a conditional version NeRF with deformable latent codes, is the first thing that this paper trains to accomplish this objective, in dynamic scenes, allowing facial deformations to be controlled through latent codes. However, to synthesize local deformations in a variety of contexts, it is not desirable to describe scene deformations using only a single latent coding. Therefore, this paper proposes a text-driven operation pipeline for facial reconstruction with NeRF, the development of an operating network that is capable of learning to represent scene changes using latent codes that vary at different spatial locations, and the integration of a WeChat mini-program to facilitate practical applications. This application approach enables even non-expert users to easily synthesize novel views. Our method has achieved a certain breakthrough in the field of 3D facial reconstruction, providing users with a simple and convenient text-driven operation approach.</div></div>","PeriodicalId":101205,"journal":{"name":"Systems and Soft Computing","volume":"7 ","pages":"Article 200185"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772941925000031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
It is now possible to do high-fidelity 3D facial reconstruction and unique view synthesis thanks to the recent discovery of Neural Radiance Fields (NeRF), which has established its substantial importance in the field of 3D vision. However, the operational approaches that are now in use require a significant amount of human engagement, such as the need for users to provide semantic masks and the inconvenience of manual attribute searching for non-expert users. Our approach focuses on enabling the manipulation of NeRF-reconstructed faces with just a single text input. A scene manipulator, specifically a conditional version NeRF with deformable latent codes, is the first thing that this paper trains to accomplish this objective, in dynamic scenes, allowing facial deformations to be controlled through latent codes. However, to synthesize local deformations in a variety of contexts, it is not desirable to describe scene deformations using only a single latent coding. Therefore, this paper proposes a text-driven operation pipeline for facial reconstruction with NeRF, the development of an operating network that is capable of learning to represent scene changes using latent codes that vary at different spatial locations, and the integration of a WeChat mini-program to facilitate practical applications. This application approach enables even non-expert users to easily synthesize novel views. Our method has achieved a certain breakthrough in the field of 3D facial reconstruction, providing users with a simple and convenient text-driven operation approach.