MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field.

IEEE transactions on visualization and computer graphics Pub Date : 2024-10-08 DOI:10.1109/TVCG.2024.3476331

Zijiang Yang, Zhongwei Qiu, Chang Xu, Dongmei Fu

{"title":"MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field.","authors":"Zijiang Yang, Zhongwei Qiu, Chang Xu, Dongmei Fu","doi":"10.1109/TVCG.2024.3476331","DOIUrl":null,"url":null,"abstract":"<p><p>3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism is proposed to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2024.3476331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism is proposed to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MM-NeRF：多模态引导的神经辐射场三维多类型转移。

三维风格转移的目的是生成具有指定风格的三维场景的风格化视图，这要求高质量的生成和保持多视图的一致性。现有的方法在高质量的纹理细节风格化和多模态引导风格化方面仍面临挑战。本文揭示了常见的 NeRF 风格化训练方法，即通过二维风格转移模型生成风格化的多视角监督，会导致监督中的同一对象在不同视角下呈现不同的状态（色调、细节等），导致 NeRF 倾向于平滑纹理细节，进一步导致三维多风格转移的低质量渲染。针对这些问题，我们提出了一种新颖的多模态引导三维多风格传输 NeRF，称为 MM-NeRF。首先，MM-NeRF 将多模态引导投射到一个统一的空间，以保持多模态风格的一致性，并提取多模态特征来引导 3D 风格化。其次，提出了一种新颖的多头学习方案来缓解多风格转移学习的困难，并提出了一种多视角风格一致性损失来跟踪多视角监督数据的不一致性。最后，提出了一种新颖的增量学习机制，以较小的成本将 MM-NeRF 推广到任何新的样式。在多个真实世界数据集上的广泛实验表明，MM-NeRF 通过多模态引导实现了高质量的三维多风格化，并在多模态引导之间保持了多视角一致性和风格一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量

期刊最新文献

2024 Reviewers List Errata to “DiffFit: Visually-Guided Differentiable Fitting of Molecule Structures to a Cryo-EM Map” The Census-Stub Graph Invariant Descriptor TimeLighting: Guided Exploration of 2D Temporal Network Projections Preface