Learning layout generation for virtual worlds

IF 18.3 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Computational Visual Media Pub Date : 2024-05-02 DOI:10.1007/s41095-023-0365-1

Weihao Cheng, Ying Shan

{"title":"Learning layout generation for virtual worlds","authors":"Weihao Cheng, Ying Shan","doi":"10.1007/s41095-023-0365-1","DOIUrl":null,"url":null,"abstract":"<p>The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"13 1","pages":""},"PeriodicalIF":18.3000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Visual Media","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s41095-023-0365-1","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为虚拟世界生成学习布局

元宇宙的出现导致对生成广阔三维世界的需求迅速增加。我们认为，一个引人入胜的世界是建立在多个土地使用区域（如森林、草地和农田）的合理布局之上的。为此，我们提出了一种从地理数据中学习土地利用分布的生成模型。该模型基于转换器架构，可生成土地利用布局的二维地图，并可根据是否提供空间和语义控制，对空间和语义控制进行调节。该模型可在用户控制下生成多样化的布局，并通过部分输入扩展边界来扩展布局。为了生成高质量和令人满意的布局，我们设计了一个几何目标函数，用于监督模型感知布局形状，并利用几何先验对生成进行正则化。此外，我们还设计了一个规划目标函数，用于监督模型感知渐进式组合需求，并抑制偏离控制的世代。为了评估世代的空间分布，我们训练了一个自动编码器，将土地利用布局嵌入向量中，以便利用受弗雷谢特截距启发的瓦瑟斯坦度量对真实数据和生成数据进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Visual Media Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

16.90

自引率

5.80%

发文量

243

审稿时长

6 weeks

期刊介绍： Computational Visual Media is a peer-reviewed open access journal. It publishes original high-quality research papers and significant review articles on novel ideas, methods, and systems relevant to visual media. Computational Visual Media publishes articles that focus on, but are not limited to, the following areas: • Editing and composition of visual media • Geometric computing for images and video • Geometry modeling and processing • Machine learning for visual media • Physically based animation • Realistic rendering • Recognition and understanding of visual media • Visual computing for robotics • Visualization and visual analytics Other interdisciplinary research into visual media that combines aspects of computer graphics, computer vision, image and video processing, geometric computing, and machine learning is also within the journal''s scope. This is an open access journal, published quarterly by Tsinghua University Press and Springer. The open access fees (article-processing charges) are fully sponsored by Tsinghua University, China. Authors can publish in the journal without any additional charges.

期刊最新文献

TrafPS: A shapley-based visual analytics approach to interpret traffic CLIP-Flow: Decoding images encoded in CLIP space CLIP-SP: Vision-language model with adaptive prompting for scene parsing SGformer: Boosting transformers for indoor lighting estimation from a single image Central similarity consistency hashing for asymmetric image retrieval