SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2023-02-02 DOI:10.48550/arXiv.2302.01330

Zhaoxi Chen, Guangcong Wang, Ziwei Liu

{"title":"SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections","authors":"Zhaoxi Chen, Guangcong Wang, Ziwei Liu","doi":"10.48550/arXiv.2302.01330","DOIUrl":null,"url":null,"abstract":"In this work, we present, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds. Project Page is available at https://scene-dreamer.github.io/. Code is available at https://github.com/FrozenBurning/SceneDreamer.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.01330","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 12

Abstract

In this work, we present, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds. Project Page is available at https://scene-dreamer.github.io/. Code is available at https://github.com/FrozenBurning/SceneDreamer.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SceneDreamer:从2D图像集合生成无界3D场景

在这项工作中，我们提出了一个无界3D场景的无条件生成模型，该模型可以从随机噪声中合成大规模3D景观。我们的框架仅从野外2D图像集合中学习，没有任何3D注释。核心是一个有原则的学习范式，包括1)高效而富有表现力的3D场景表示，2)生成场景参数化，以及3)可以利用2D图像知识的有效渲染器。我们的方法首先从单纯形噪声生成的高效鸟瞰图(BEV)表示开始，其中包括用于地表高程的高度场和用于详细场景语义的语义场。该BEV场景表示实现了1)以二次复杂度表示3D场景，2)解纠缠的几何和语义，以及3)高效的训练。此外，我们提出了一种新的基于三维位置和场景语义的生成神经哈希网格来参数化潜在空间，旨在编码各种场景的可泛化特征。最后，通过对抗训练从2D图像集合中学习神经体渲染器来生成逼真的图像。大量的实验证明了在生成生动而多样的无界3D世界方面，其有效性和优越性优于最先进的方法。项目页面可访问https://scene-dreamer.github.io/。代码可从https://github.com/FrozenBurning/SceneDreamer获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.