Haisor: Human-Aware Indoor Scene Optimization via Deep Reinforcement Learning

IF 7.8 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Graphics Pub Date : 2023-11-18 DOI:10.1145/3632947

Jia-Mu Sun, Jie Yang, Kaichun Mo, Yu-Kun Lai, Leonidas Guibas, Lin Gao

{"title":"Haisor: Human-Aware Indoor Scene Optimization via Deep Reinforcement Learning","authors":"Jia-Mu Sun, Jie Yang, Kaichun Mo, Yu-Kun Lai, Leonidas Guibas, Lin Gao","doi":"10.1145/3632947","DOIUrl":null,"url":null,"abstract":"<p>3D scene synthesis facilitates and benefits many real-world applications. Most scene generators focus on making indoor scenes plausible via learning from training data and leveraging extra constraints such as adjacency and symmetry. Although the generated 3D scenes are mostly plausible with visually realistic layouts, they can be functionally unsuitable for human users to navigate and interact with furniture. Our key observation is that human activity plays a critical role and sufficient free space is essential for human-scene interactions. This is exactly where many existing synthesized scenes fail – the seemingly correct layouts are often not fit for living. To tackle this, we present a human-aware optimization framework <span>Haisor</span> for 3D indoor scene arrangement via reinforcement learning, which aims to find an action sequence to optimize the indoor scene layout automatically. Based on the hierarchical scene graph representation, an optimal action sequence is predicted and performed via Deep Q-Learning with Monte Carlo Tree Search (MCTS), where MCTS is our key feature to search for the optimal solution in long-term sequences and large action space. Multiple human-aware rewards are designed as our core criteria of human-scene interaction, aiming to identify the next smart action by leveraging powerful reinforcement learning. Our framework is optimized end-to-end by giving the indoor scenes with part-level furniture layout including part mobility information. Furthermore, our methodology is extensible and allows utilizing different reward designs to achieve personalized indoor scene synthesis. Extensive experiments demonstrate that our approach optimizes the layout of 3D indoor scenes in a human-aware manner, which is more realistic and plausible than original state-of-the-art generator results, and our approach produces superior smart actions, outperforming alternative baselines.</p>","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"86 19","pages":""},"PeriodicalIF":7.8000,"publicationDate":"2023-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3632947","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

3D scene synthesis facilitates and benefits many real-world applications. Most scene generators focus on making indoor scenes plausible via learning from training data and leveraging extra constraints such as adjacency and symmetry. Although the generated 3D scenes are mostly plausible with visually realistic layouts, they can be functionally unsuitable for human users to navigate and interact with furniture. Our key observation is that human activity plays a critical role and sufficient free space is essential for human-scene interactions. This is exactly where many existing synthesized scenes fail – the seemingly correct layouts are often not fit for living. To tackle this, we present a human-aware optimization framework Haisor for 3D indoor scene arrangement via reinforcement learning, which aims to find an action sequence to optimize the indoor scene layout automatically. Based on the hierarchical scene graph representation, an optimal action sequence is predicted and performed via Deep Q-Learning with Monte Carlo Tree Search (MCTS), where MCTS is our key feature to search for the optimal solution in long-term sequences and large action space. Multiple human-aware rewards are designed as our core criteria of human-scene interaction, aiming to identify the next smart action by leveraging powerful reinforcement learning. Our framework is optimized end-to-end by giving the indoor scenes with part-level furniture layout including part mobility information. Furthermore, our methodology is extensible and allows utilizing different reward designs to achieve personalized indoor scene synthesis. Extensive experiments demonstrate that our approach optimizes the layout of 3D indoor scenes in a human-aware manner, which is more realistic and plausible than original state-of-the-art generator results, and our approach produces superior smart actions, outperforming alternative baselines.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Haisor:基于深度强化学习的人类感知室内场景优化

3D场景合成有利于许多现实世界的应用。大多数场景生成器专注于通过从训练数据中学习和利用额外的约束(如邻接性和对称性)来使室内场景可信。虽然生成的3D场景大多具有视觉逼真的布局，但它们在功能上可能不适合人类用户导航和与家具交互。我们的主要观察是，人类活动起着至关重要的作用，足够的自由空间对于人与场景的互动至关重要。这正是许多现有的合成场景失败的地方——看似正确的布局往往不适合生活。为了解决这个问题，我们提出了一个基于强化学习的人类感知优化框架Haisor，该框架旨在寻找一个自动优化室内场景布局的动作序列。基于分层场景图表示，通过深度q学习与蒙特卡罗树搜索(MCTS)预测并执行最优动作序列，其中MCTS是我们在长期序列和大动作空间中搜索最优解的关键特征。多种人类感知奖励被设计为人类场景交互的核心标准，旨在通过利用强大的强化学习来识别下一个智能动作。我们的框架通过提供包含部分移动信息的部分级家具布局的室内场景来进行端到端的优化。此外，我们的方法是可扩展的，并允许使用不同的奖励设计来实现个性化的室内场景合成。大量的实验表明，我们的方法以一种人类感知的方式优化了3D室内场景的布局，这比原始的最先进的生成器结果更真实、更可信，并且我们的方法产生了卓越的智能动作，优于其他基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Graphics 工程技术-计算机：软件工程

CiteScore

14.30

自引率

25.80%

发文量

193

审稿时长

12 months

期刊介绍： ACM Transactions on Graphics (TOG) is a peer-reviewed scientific journal that aims to disseminate the latest findings of note in the field of computer graphics. It has been published since 1982 by the Association for Computing Machinery. Starting in 2003, all papers accepted for presentation at the annual SIGGRAPH conference are printed in a special summer issue of the journal.

期刊最新文献

Direct Manipulation of Procedural Implicit Surfaces 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting Quark: Real-time, High-resolution, and General Neural View Synthesis Differentiable Owen Scrambling ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling