Semantic Implicit Neural Scene Representations With Semi-Supervised Training

2020 International Conference on 3D Vision (3DV) Pub Date : 2020-11-01 DOI:10.1109/3DV50981.2020.00052

Amit Kohli, V. Sitzmann, Gordon Wetzstein

{"title":"Semantic Implicit Neural Scene Representations With Semi-Supervised Training","authors":"Amit Kohli, V. Sitzmann, Gordon Wetzstein","doi":"10.1109/3DV50981.2020.00052","DOIUrl":null,"url":null,"abstract":"The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) [67] is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on 3D Vision (3DV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3DV50981.2020.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) [67] is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于半监督训练的语义隐式神经场景表示

隐式神经场景表示最近的成功为我们捕获和存储3D场景提供了一种可行的新方法。与传统的3D表示(如点云)不同，点云明确地将场景属性存储在离散的局部单元中，这些隐式表示在神经网络的权重中编码场景，该神经网络可以在任何坐标上查询以产生相同的场景属性。到目前为止，隐式表示主要被优化为仅估计场景中的外观和/或3D几何信息。我们采取下一步并证明现有的隐式表示(srn)[67]实际上是多模态的;可以进一步利用它来执行逐点语义分割，同时保留其表示外观和几何形状的能力。为了实现这种多模态行为，我们在现有的预训练场景表示上使用了半监督学习策略。我们的方法简单、通用，只需要几十个带标签的二维分割掩码就可以实现密集的三维语义分割。我们探索了这种语义感知的隐式神经场景表示的两种新应用:仅给定单个输入RGB图像或2D标签掩码的3D新视图和语义标签合成，以及外观和语义的3D插值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 International Conference on 3D Vision (3DV)

自引率

0.00%

发文量