2020 International Conference on 3D Vision (3DV)最新文献_第2页

Deep Sketch-Based Modeling: Tips and Tricks 基于草图的深度建模:技巧和技巧

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00064

Yue Zhong, Yulia Gryaditskaya, Honggang Zhang, Yi-Zhe Song

Deep image-based modeling received lots of attention in recent years, yet the parallel problem of sketch-based modeling has only been briefly studied, often as a potential application. In this work, for the first time, we identify the main differences between sketch and image inputs: (i) style variance, (ii) imprecise perspective, and (iii) sparsity. We discuss why each of these differences can pose a challenge, and even make a certain class of image-based methods inapplicable. We study alternative solutions to address each of the difference. By doing so, we drive out a few important insights: (i) sparsity commonly results in an incorrect prediction of foreground versus background, (ii) diversity of human styles, if not taken into account, can lead to very poor generalization properties, and finally (iii) unless a dedicated sketching interface is used, one can not expect sketches to match a perspective of a fixed viewpoint. Finally, we compare a set of representative deep single-image modeling solutions and show how their performance can be improved to tackle sketch input by taking into consideration the identified critical differences.

近年来，基于图像的深度建模受到了广泛的关注，但基于草图的建模并行问题的研究却很少，往往是一个潜在的应用。在这项工作中，我们首次确定了草图和图像输入之间的主要差异:(i)风格差异，(ii)不精确的视角，以及(iii)稀疏性。我们将讨论为什么这些差异会带来挑战，甚至使某类基于图像的方法不适用。我们研究替代解决方案来解决每个差异。通过这样做，我们得出了一些重要的见解:(i)稀疏性通常导致前景与背景的不正确预测，(ii)人类风格的多样性，如果不考虑，可能导致非常差的泛化属性，最后(iii)除非使用专用的草图界面，否则不能期望草图匹配固定视点的视角。最后，我们比较了一组具有代表性的深度单图像建模解决方案，并展示了如何通过考虑已识别的关键差异来改进其性能以处理草图输入。

{"title":"Deep Sketch-Based Modeling: Tips and Tricks","authors":"Yue Zhong, Yulia Gryaditskaya, Honggang Zhang, Yi-Zhe Song","doi":"10.1109/3DV50981.2020.00064","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00064","url":null,"abstract":"Deep image-based modeling received lots of attention in recent years, yet the parallel problem of sketch-based modeling has only been briefly studied, often as a potential application. In this work, for the first time, we identify the main differences between sketch and image inputs: (i) style variance, (ii) imprecise perspective, and (iii) sparsity. We discuss why each of these differences can pose a challenge, and even make a certain class of image-based methods inapplicable. We study alternative solutions to address each of the difference. By doing so, we drive out a few important insights: (i) sparsity commonly results in an incorrect prediction of foreground versus background, (ii) diversity of human styles, if not taken into account, can lead to very poor generalization properties, and finally (iii) unless a dedicated sketching interface is used, one can not expect sketches to match a perspective of a fixed viewpoint. Finally, we compare a set of representative deep single-image modeling solutions and show how their performance can be improved to tackle sketch input by taking into consideration the identified critical differences.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124085263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Adiabatic Quantum Graph Matching with Permutation Matrix Constraints 具有置换矩阵约束的绝热量子图匹配

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00068

Marcel Seelbach Benkner, Vladislav Golyanik, C. Theobalt, Michael Moeller

Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q(211 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics1

三维形状和图像的匹配问题是具有挑战性的，因为它们经常被表述为具有排列矩阵约束的组合二次分配问题(qap)，这是np困难的。在这项工作中，我们用新兴的量子计算技术解决了这些问题，并提出了几个qap的重新表述，作为适合在量子硬件上有效执行的无约束问题。研究了在可映射到量子硬件的二次型无约束二元优化问题中注入置换矩阵约束的几种方法。我们的重点是获得足够的谱间隙，这进一步增加了在单次运行中测量最优解和有效排列矩阵的概率。我们在量子计算机D-Wave 2000Q(211量子比特，绝热)上进行实验。尽管在模拟绝热量子计算和实际量子硬件上的执行之间观察到差异，但在我们的实验中，我们对排列矩阵约束的重新表述增加了数值计算比其他惩罚方法的鲁棒性。所提出的算法有可能在未来的量子计算架构上扩展到更高的维度，这为解决3D计算机视觉和图形中的匹配问题开辟了多个新的方向

{"title":"Adiabatic Quantum Graph Matching with Permutation Matrix Constraints","authors":"Marcel Seelbach Benkner, Vladislav Golyanik, C. Theobalt, Michael Moeller","doi":"10.1109/3DV50981.2020.00068","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00068","url":null,"abstract":"Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q(211 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics1","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Intrinsic Autoencoders for Joint Deferred Neural Rendering and Intrinsic Image Decomposition 联合递延神经渲染和图像分解的内在自编码器

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00128

Hassan Abu Alhaija, Siva Karthik Mustikovela, Justus Thies, V. Jampani, M. Nießner, Andreas Geiger, C. Rother

Neural rendering techniques promise efficient photorealistic image synthesis while providing rich control over scene parameters by learning the physical image formation process. While several supervised methods have been proposed for this task, acquiring a dataset of images with accurately aligned 3D models is very difficult. The main contribution of this work is to lift this restriction by training a neural rendering algorithm from unpaired data. We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. In contrast to a traditional graphics pipeline, our approach does not require to specify all scene properties, such as material parameters and lighting by hand. Instead, we learn photo-realistic deferred rendering from a small set of 3D models and a larger set of unaligned real images, both of which are easy to acquire in practice. Simultaneously, we obtain accurate intrinsic decompositions of real images while not requiring paired ground truth. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.

神经渲染技术承诺高效逼真的图像合成，同时通过学习物理图像形成过程提供丰富的场景参数控制。虽然针对该任务提出了几种监督方法，但获取具有精确对齐3D模型的图像数据集是非常困难的。这项工作的主要贡献是通过从未配对数据中训练神经渲染算法来解除这一限制。我们提出了一种自动编码器，用于从合成3D模型中联合生成逼真图像，同时将真实图像分解为其内在形状和外观属性。与传统的图形管道相比，我们的方法不需要手动指定所有场景属性，例如材料参数和照明。相反，我们从一组小的3D模型和一组大的未对齐的真实图像中学习逼真的延迟渲染，这两者在实践中都很容易获得。同时，在不需要对地真值的情况下，我们得到了真实图像的准确的内在分解。我们的实验证实，渲染和分解的联合处理确实是有益的，并且我们的方法在定性和定量上都优于最先进的图像到图像翻译基线。

{"title":"Intrinsic Autoencoders for Joint Deferred Neural Rendering and Intrinsic Image Decomposition","authors":"Hassan Abu Alhaija, Siva Karthik Mustikovela, Justus Thies, V. Jampani, M. Nießner, Andreas Geiger, C. Rother","doi":"10.1109/3DV50981.2020.00128","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00128","url":null,"abstract":"Neural rendering techniques promise efficient photorealistic image synthesis while providing rich control over scene parameters by learning the physical image formation process. While several supervised methods have been proposed for this task, acquiring a dataset of images with accurately aligned 3D models is very difficult. The main contribution of this work is to lift this restriction by training a neural rendering algorithm from unpaired data. We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. In contrast to a traditional graphics pipeline, our approach does not require to specify all scene properties, such as material parameters and lighting by hand. Instead, we learn photo-realistic deferred rendering from a small set of 3D models and a larger set of unaligned real images, both of which are easy to acquire in practice. Simultaneously, we obtain accurate intrinsic decompositions of real images while not requiring paired ground truth. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130002469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Underwater Scene Recovery Using Wavelength-Dependent Refraction of Light 使用波长相关光折射的水下场景恢复

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00013

S. Ishihara, Yuta Asano, Yinqiang Zheng, Imari Sato

This paper proposes a method of underwater depth estimation from an orthographic multispectral image. In accordance with Snell’s law, incoming light is refracted when it enters the water surface, and its directions are determined by the refractive index and the normals of the water surface. The refractive index is wavelength-dependent, and this leads to some disparity between images taken at different wavelengths. Given the camera orientation and the refractive index of a medium such as water, our approach can reconstruct the underwater scene with unknown water surface from the disparity observed in images taken at different wavelengths. We verified the effectiveness of our method through simulations and real experiments on various scenes.

提出了一种基于正射影多光谱图像的水下深度估计方法。根据斯涅尔定律，入射光在进入水面时会发生折射，其方向由折射率和水面法线决定。折射率与波长有关，这导致在不同波长拍摄的图像之间存在一些差异。在给定相机方位和水等介质折射率的情况下，我们的方法可以利用不同波长图像的视差重建未知水面的水下场景。通过各种场景的仿真和真实实验，验证了该方法的有效性。

引用次数: 2

Refractive Multi-view Stereo 折射多视点立体

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00048

M. Cassidy, J. Mélou, Y. Quéau, F. Lauze, Jean-Denis Durou

In this article we show how to extend the multi-view stereo technique when the object to be reconstructed is inside a transparent – but refractive – material, which causes distortions in the images. We provide a theoretical formulation of the problem accounting for a general, non-planar shape of the refractive interface, then a discrete solving method, which are validated by tests on synthetic and real data.

在这篇文章中，我们展示了如何扩展多视图立体技术，当要重建的对象是一个透明的-但折射-材料，这会导致图像失真。本文首先给出了考虑一般非平面折射率界面形状的理论公式，然后给出了离散求解方法，并通过综合数据和实际数据进行了验证。

引用次数: 9

Two-Stage Relation Constraint for Semantic Segmentation of Point Clouds 点云语义分割的两阶段关系约束

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00037

Minghui Yu, Jinxian Liu, Bingbing Ni, Caiyuan Li

Key to point cloud semantic segmentation is to learn discriminative representations involving of capturing effective relations among points. Many works add hard constraints on points through predefined convolution kernels. Motivated by label propagation algorithm, we develop Dynamic Adjustable Group Propagation (DAGP) with a dynamic adjustable scale module approximating distance parameter. Based on DAGP, we develop a novel Two Stage Propagation framework (TSP) to add intra-group and intergroup relation constraints on representations to enhance the discrimination of features from different group levels. We adopt well-appreciated backbone to extract features for input point cloud and then divide them into groups. DAGP is utilized to propagate information within each group in first stage. To promote information dissemination between groups more efficiently, a selection strategy is introduced to select group-pairs for second stage which propagating labels among selected group-pairs by DAGP. By training with this new learning architecture, the backbone network is enforced to mine relational context information within and between groups without introducing any extra computation burden during inference. Extensive experimental results show that TSP significantly improves the performance of existing popular architectures (PointNet, PointNet++, DGCNN) on large scene segmentation benchmarks (S3DIS, ScanNet) and part segmentation dataset ShapeNet.

点云语义分割的关键是学习判别表示，包括捕获点之间的有效关系。许多作品通过预定义的卷积核对点添加硬约束。在标签传播算法的启发下，我们开发了动态可调群传播(DAGP)算法，该算法具有一个近似距离参数的动态可调尺度模块。在DAGP的基础上，提出了一种新的两阶段传播框架(TSP)，在表示上增加组内和组间关系约束，以增强对不同组层次特征的区分。我们采用高度赞赏的主干提取输入点云的特征，并对其进行分组。在第一阶段，使用DAGP在每个组内传播信息。为了更有效地促进群体间的信息传播，引入了一种选择策略来选择第二阶段的群体对，即通过DAGP在被选择的群体对之间传播标签。通过使用这种新的学习体系结构进行训练，骨干网络被强制挖掘组内和组间的关系上下文信息，而不会在推理过程中引入任何额外的计算负担。大量的实验结果表明，TSP显著提高了现有流行架构(PointNet、PointNet++、DGCNN)在大型场景分割基准(S3DIS、ScanNet)和零件分割数据集ShapeNet上的性能。

{"title":"Two-Stage Relation Constraint for Semantic Segmentation of Point Clouds","authors":"Minghui Yu, Jinxian Liu, Bingbing Ni, Caiyuan Li","doi":"10.1109/3DV50981.2020.00037","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00037","url":null,"abstract":"Key to point cloud semantic segmentation is to learn discriminative representations involving of capturing effective relations among points. Many works add hard constraints on points through predefined convolution kernels. Motivated by label propagation algorithm, we develop Dynamic Adjustable Group Propagation (DAGP) with a dynamic adjustable scale module approximating distance parameter. Based on DAGP, we develop a novel Two Stage Propagation framework (TSP) to add intra-group and intergroup relation constraints on representations to enhance the discrimination of features from different group levels. We adopt well-appreciated backbone to extract features for input point cloud and then divide them into groups. DAGP is utilized to propagate information within each group in first stage. To promote information dissemination between groups more efficiently, a selection strategy is introduced to select group-pairs for second stage which propagating labels among selected group-pairs by DAGP. By training with this new learning architecture, the backbone network is enforced to mine relational context information within and between groups without introducing any extra computation burden during inference. Extensive experimental results show that TSP significantly improves the performance of existing popular architectures (PointNet, PointNet++, DGCNN) on large scene segmentation benchmarks (S3DIS, ScanNet) and part segmentation dataset ShapeNet.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115115172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VIPNet: A Fast and Accurate Single-View Volumetric Reconstruction by Learning Sparse Implicit Point Guidance VIPNet:一种基于学习稀疏隐式点引导的快速准确的单视图体重建方法

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00065

Dong Du, Zhiyi Zhang, Xiaoguang Han, Shuguang Cui, Ligang Liu

With the advent of deep neural networks, learning-based single-view reconstruction has gained popularity. However, in 3D, there is no absolutely dominant representation that is both computationally efficient and accurate yet allows for reconstructing high-resolution geometry of arbitrary topology. After all, the accurate implicit methods are time-consuming due to dense sampling and inference, while volumetric approaches are fast but limited to heavy memory usage and low accuracy. In this paper, we propose VIPNet, an end-to-end hybrid representation learning for fast and accurate single-view reconstruction under sparse implicit point guidance. Given an image, it first generates a volumetric result. Meanwhile, a corresponding implicit shape representation is learned. To balance the efficiency and accuracy, we adopt PointGenNet to learn some representative points for guiding the voxel refinement with the corresponding sparse implicit inference. A strategy of patch-based synthesis with global-local features under implicit guidance is also applied for reducing memory consumption required to generate high-resolution output. Extensive experiments demonstrate the effectiveness of our method both qualitatively and quantitatively, which indicates that our proposed hybrid learning outperforms separate representation learning. Specifically, our network not only runs 60 times faster than implicit methods but also contributes to accuracy gains. We hope it will inspire a re-thinking of hybrid representation learning.

随着深度神经网络的出现，基于学习的单视图重建得到了广泛的应用。然而，在3D中，没有绝对的主导表示，既计算效率高又准确，又允许重建任意拓扑的高分辨率几何形状。毕竟，精确的隐式方法由于采样和推理密集而耗时，而体积方法虽然速度快，但受内存占用大和精度低的限制。在本文中，我们提出了一种端到端混合表示学习VIPNet，用于在稀疏隐式点引导下快速准确的单视图重建。给定一个图像，它首先生成一个体积结果。同时，学习相应的隐式形状表示。为了平衡效率和准确性，我们采用PointGenNet学习一些有代表性的点来指导体素的细化，并进行相应的稀疏隐式推理。为了减少生成高分辨率输出所需的内存消耗，还采用了隐式指导下全局局部特征的基于补丁的合成策略。大量的实验证明了我们的方法在定性和定量上的有效性，这表明我们提出的混合学习优于单独的表示学习。具体来说，我们的网络不仅运行速度比隐式方法快60倍，而且有助于提高准确性。我们希望它能激发人们对混合表示学习的重新思考。

{"title":"VIPNet: A Fast and Accurate Single-View Volumetric Reconstruction by Learning Sparse Implicit Point Guidance","authors":"Dong Du, Zhiyi Zhang, Xiaoguang Han, Shuguang Cui, Ligang Liu","doi":"10.1109/3DV50981.2020.00065","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00065","url":null,"abstract":"With the advent of deep neural networks, learning-based single-view reconstruction has gained popularity. However, in 3D, there is no absolutely dominant representation that is both computationally efficient and accurate yet allows for reconstructing high-resolution geometry of arbitrary topology. After all, the accurate implicit methods are time-consuming due to dense sampling and inference, while volumetric approaches are fast but limited to heavy memory usage and low accuracy. In this paper, we propose VIPNet, an end-to-end hybrid representation learning for fast and accurate single-view reconstruction under sparse implicit point guidance. Given an image, it first generates a volumetric result. Meanwhile, a corresponding implicit shape representation is learned. To balance the efficiency and accuracy, we adopt PointGenNet to learn some representative points for guiding the voxel refinement with the corresponding sparse implicit inference. A strategy of patch-based synthesis with global-local features under implicit guidance is also applied for reducing memory consumption required to generate high-resolution output. Extensive experiments demonstrate the effectiveness of our method both qualitatively and quantitatively, which indicates that our proposed hybrid learning outperforms separate representation learning. Specifically, our network not only runs 60 times faster than implicit methods but also contributes to accuracy gains. We hope it will inspire a re-thinking of hybrid representation learning.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115830389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views 一种多视图三维形状重建的分割与分割方法

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00026

Riccardo Spezialetti, D. Tan, A. Tonioni, Keisuke Tateno, Federico Tombari

Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. Most approaches regress the full object shape in a canonical pose, possibly extrapolating the occluded parts based on the learned priors. However, their viewpoint invariant technique often discards the unique structures visible from the input images. In contrast, this paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. Our approach is divided into three steps. Starting from the sparse views of the object, we first align them into a common coordinate system by estimating the relative pose between all the pairs. Then, inspired by the traditional voxel carving, we generate an occupancy grid of the object taken from the silhouette on the images and their relative poses. Finally, we refine the initial reconstruction to build a clean 3D model which preserves the details from each viewpoint. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.

由于最近深度学习的突破，从单个或多个图像中估计物体的3D形状已经很受欢迎。大多数方法在一个规范的姿势中回归完整的物体形状，可能根据学习到的先验推断被遮挡的部分。然而，他们的视点不变性技术往往抛弃了从输入图像中可见的唯一结构。相比之下，本文提出通过合并来自给定视图的可见信息来依赖视点变量重建。我们的方法分为三个步骤。从物体的稀疏视图出发，我们首先通过估计所有对之间的相对姿态将它们对齐到一个公共坐标系中。然后，受传统体素雕刻的启发，我们从图像上的轮廓和它们的相对姿势生成物体的占用网格。最后，我们对初始重建进行细化，以构建一个干净的3D模型，该模型保留了每个视点的细节。为了验证所提出的方法，我们在相对姿态估计和三维形状重建方面对ShapeNet参考基准进行了综合评估。

{"title":"A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views","authors":"Riccardo Spezialetti, D. Tan, A. Tonioni, Keisuke Tateno, Federico Tombari","doi":"10.1109/3DV50981.2020.00026","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00026","url":null,"abstract":"Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. Most approaches regress the full object shape in a canonical pose, possibly extrapolating the occluded parts based on the learned priors. However, their viewpoint invariant technique often discards the unique structures visible from the input images. In contrast, this paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. Our approach is divided into three steps. Starting from the sparse views of the object, we first align them into a common coordinate system by estimating the relative pose between all the pairs. Then, inspired by the traditional voxel carving, we generate an occupancy grid of the object taken from the silhouette on the images and their relative poses. Finally, we refine the initial reconstruction to build a clean 3D model which preserves the details from each viewpoint. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117003918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Compression and Completion of Animated Point Clouds using Topological Properties of the Manifold 利用流形的拓扑特性压缩和补全动画点云

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00083

Linfei Pan, L. Ladicky, M. Pollefeys

Recent progress in consumer hardware allowed for the collection of a large amount of animated point cloud data, which is on the one hand highly redundant and on the other hand incomplete. Our goal is to bridge this gap and find a low dimensional representation capable of approximation to a desired precision and completion of missing data. Model-less non-rigid 3D reconstruction algorithms, formulated as a linear factorization of observed point tracks into static shape component and dynamic pose, have been found insufficient to create suitable generative models, capable of generating new unobserved poses. This is due to the non-locality of the linear models, over-fitting to the non-causal correlations present in the data, which manifests in the reconstruction containing rigidly behaving not directly connected parts. In this paper, we propose a new method that can distinguish body parts and factorize the data into shape and pose purely using topological properties of the manifold-local deformations and neighborhoods. To obtain localized factorization, we formulate the deformation distance between two point tracks as the smallest deformation along the path between them. After embedding such distance in low dimensional space, a clustering of embedded data leads to close to rigid components, suitable as initialization for fitting a model-a skinned rigged mesh, used extensively in computer graphics. As both local deformations and neighborhoods of a point are local and can be estimated only from the part of the animation, the method can be used to recover unobserved data in each frame.

消费类硬件的最新进展允许收集大量的动画点云数据，这一方面是高度冗余的，另一方面是不完整的。我们的目标是弥合这一差距，并找到一种低维表示，能够接近所需的精度并完成缺失的数据。无模型非刚性三维重建算法，将观察点轨迹线性分解为静态形状分量和动态姿态，已被发现不足以创建合适的生成模型，能够生成新的未观察到的姿态。这是由于线性模型的非局部性，过度拟合数据中存在的非因果相关性，这体现在包含刚性行为的非直接连接部分的重建中。在本文中，我们提出了一种新的方法，可以区分身体部位，并将数据分解为形状和姿态纯粹利用流形局部变形和邻域的拓扑性质。为了得到局部分解，我们将两点轨迹之间的变形距离表述为两点轨迹之间路径上的最小变形。在低维空间中嵌入这样的距离后，嵌入数据的聚类导致接近刚性组件，适合作为拟合模型的初始化-蒙皮操纵网格，广泛用于计算机图形学。由于点的局部变形和邻域都是局部的，只能从动画的一部分估计，因此该方法可以用于恢复每帧中未观察到的数据。

{"title":"Compression and Completion of Animated Point Clouds using Topological Properties of the Manifold","authors":"Linfei Pan, L. Ladicky, M. Pollefeys","doi":"10.1109/3DV50981.2020.00083","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00083","url":null,"abstract":"Recent progress in consumer hardware allowed for the collection of a large amount of animated point cloud data, which is on the one hand highly redundant and on the other hand incomplete. Our goal is to bridge this gap and find a low dimensional representation capable of approximation to a desired precision and completion of missing data. Model-less non-rigid 3D reconstruction algorithms, formulated as a linear factorization of observed point tracks into static shape component and dynamic pose, have been found insufficient to create suitable generative models, capable of generating new unobserved poses. This is due to the non-locality of the linear models, over-fitting to the non-causal correlations present in the data, which manifests in the reconstruction containing rigidly behaving not directly connected parts. In this paper, we propose a new method that can distinguish body parts and factorize the data into shape and pose purely using topological properties of the manifold-local deformations and neighborhoods. To obtain localized factorization, we formulate the deformation distance between two point tracks as the smallest deformation along the path between them. After embedding such distance in low dimensional space, a clustering of embedded data leads to close to rigid components, suitable as initialization for fitting a model-a skinned rigged mesh, used extensively in computer graphics. As both local deformations and neighborhoods of a point are local and can be estimated only from the part of the animation, the method can be used to recover unobserved data in each frame.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134529780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shape from Tracing: Towards Reconstructing 3D Object Geometry and SVBRDF Material from Images via Differentiable Path Tracing 形状跟踪:通过可微分路径跟踪从图像重建3D物体几何和SVBRDF材料

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-11-01 DOI: 10.1109/3DV50981.2020.00129

Purvi Goel, L. Cohen, James Guesman, V. Thamizharasan, J. Tompkin, Daniel Ritchie

Reconstructing object geometry and material from multiple views typically requires optimization. Differentiable path tracing is an appealing framework as it can reproduce complex appearance effects. However, it is difficult to use due to high computational cost. In this paper, we explore how to use differentiable ray tracing to refine an initial coarse mesh and per-mesh-facet material representation. In simulation, we find that it is possible to reconstruct fine geometric and material detail from low resolution input views, allowing high-quality reconstructions in a few hours despite the expense of path tracing. The reconstructions successfully disambiguate shading, shadow, and global illumination effects such as diffuse interreflection from material properties. We demonstrate the impact of different geometry initializations, including space carving, multi-view stereo, and 3D neural networks. Finally, with input captured using smartphone video and a consumer 360° camera for lighting estimation, we also show how to refine initial reconstructions of real-world objects in unconstrained environments.

从多个视图重建物体几何和材料通常需要优化。可微路径跟踪是一个很有吸引力的框架，因为它可以再现复杂的外观效果。然而，由于计算成本高，难以使用。在本文中，我们探讨了如何使用可微光线追踪来细化初始粗网格和每网格面材料表示。在模拟中，我们发现可以从低分辨率的输入视图中重建精细的几何和材料细节，尽管路径跟踪的费用很高，但可以在几个小时内实现高质量的重建。重建成功地消除了材质属性中的阴影、阴影和全局照明效果(如漫反射)的歧义。我们演示了不同几何初始化的影响，包括空间雕刻，多视图立体和3D神经网络。最后，通过使用智能手机视频和消费者360°相机进行照明估计的输入，我们还展示了如何在不受约束的环境中改进真实世界物体的初始重建。

{"title":"Shape from Tracing: Towards Reconstructing 3D Object Geometry and SVBRDF Material from Images via Differentiable Path Tracing","authors":"Purvi Goel, L. Cohen, James Guesman, V. Thamizharasan, J. Tompkin, Daniel Ritchie","doi":"10.1109/3DV50981.2020.00129","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00129","url":null,"abstract":"Reconstructing object geometry and material from multiple views typically requires optimization. Differentiable path tracing is an appealing framework as it can reproduce complex appearance effects. However, it is difficult to use due to high computational cost. In this paper, we explore how to use differentiable ray tracing to refine an initial coarse mesh and per-mesh-facet material representation. In simulation, we find that it is possible to reconstruct fine geometric and material detail from low resolution input views, allowing high-quality reconstructions in a few hours despite the expense of path tracing. The reconstructions successfully disambiguate shading, shadow, and global illumination effects such as diffuse interreflection from material properties. We demonstrate the impact of different geometry initializations, including space carving, multi-view stereo, and 3D neural networks. Finally, with input captured using smartphone video and a consumer 360° camera for lighting estimation, we also show how to refine initial reconstructions of real-world objects in unconstrained environments.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133386374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13