Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00064
Yue Zhong, Yulia Gryaditskaya, Honggang Zhang, Yi-Zhe Song
Deep image-based modeling received lots of attention in recent years, yet the parallel problem of sketch-based modeling has only been briefly studied, often as a potential application. In this work, for the first time, we identify the main differences between sketch and image inputs: (i) style variance, (ii) imprecise perspective, and (iii) sparsity. We discuss why each of these differences can pose a challenge, and even make a certain class of image-based methods inapplicable. We study alternative solutions to address each of the difference. By doing so, we drive out a few important insights: (i) sparsity commonly results in an incorrect prediction of foreground versus background, (ii) diversity of human styles, if not taken into account, can lead to very poor generalization properties, and finally (iii) unless a dedicated sketching interface is used, one can not expect sketches to match a perspective of a fixed viewpoint. Finally, we compare a set of representative deep single-image modeling solutions and show how their performance can be improved to tackle sketch input by taking into consideration the identified critical differences.
{"title":"Deep Sketch-Based Modeling: Tips and Tricks","authors":"Yue Zhong, Yulia Gryaditskaya, Honggang Zhang, Yi-Zhe Song","doi":"10.1109/3DV50981.2020.00064","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00064","url":null,"abstract":"Deep image-based modeling received lots of attention in recent years, yet the parallel problem of sketch-based modeling has only been briefly studied, often as a potential application. In this work, for the first time, we identify the main differences between sketch and image inputs: (i) style variance, (ii) imprecise perspective, and (iii) sparsity. We discuss why each of these differences can pose a challenge, and even make a certain class of image-based methods inapplicable. We study alternative solutions to address each of the difference. By doing so, we drive out a few important insights: (i) sparsity commonly results in an incorrect prediction of foreground versus background, (ii) diversity of human styles, if not taken into account, can lead to very poor generalization properties, and finally (iii) unless a dedicated sketching interface is used, one can not expect sketches to match a perspective of a fixed viewpoint. Finally, we compare a set of representative deep single-image modeling solutions and show how their performance can be improved to tackle sketch input by taking into consideration the identified critical differences.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124085263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00068
Marcel Seelbach Benkner, Vladislav Golyanik, C. Theobalt, Michael Moeller
Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q(211 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics1
{"title":"Adiabatic Quantum Graph Matching with Permutation Matrix Constraints","authors":"Marcel Seelbach Benkner, Vladislav Golyanik, C. Theobalt, Michael Moeller","doi":"10.1109/3DV50981.2020.00068","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00068","url":null,"abstract":"Matching problems on 3D shapes and images are challenging as they are frequently formulated as combinatorial quadratic assignment problems (QAPs) with permutation matrix constraints, which are NP-hard. In this work, we address such problems with emerging quantum computing technology and propose several reformulations of QAPs as unconstrained problems suitable for efficient execution on quantum hardware. We investigate several ways to inject permutation matrix constraints in a quadratic unconstrained binary optimization problem which can be mapped to quantum hardware. We focus on obtaining a sufficient spectral gap, which further increases the probability to measure optimal solutions and valid permutation matrices in a single run. We perform our experiments on the quantum computer D-Wave 2000Q(211 qubits, adiabatic). Despite the observed discrepancy between simulated adiabatic quantum computing and execution on real quantum hardware, our reformulation of permutation matrix constraints increases the robustness of the numerical computations over other penalty approaches in our experiments. The proposed algorithm has the potential to scale to higher dimensions on future quantum computing architectures, which opens up multiple new directions for solving matching problems in 3D computer vision and graphics1","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00128
Hassan Abu Alhaija, Siva Karthik Mustikovela, Justus Thies, V. Jampani, M. Nießner, Andreas Geiger, C. Rother
Neural rendering techniques promise efficient photorealistic image synthesis while providing rich control over scene parameters by learning the physical image formation process. While several supervised methods have been proposed for this task, acquiring a dataset of images with accurately aligned 3D models is very difficult. The main contribution of this work is to lift this restriction by training a neural rendering algorithm from unpaired data. We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. In contrast to a traditional graphics pipeline, our approach does not require to specify all scene properties, such as material parameters and lighting by hand. Instead, we learn photo-realistic deferred rendering from a small set of 3D models and a larger set of unaligned real images, both of which are easy to acquire in practice. Simultaneously, we obtain accurate intrinsic decompositions of real images while not requiring paired ground truth. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
{"title":"Intrinsic Autoencoders for Joint Deferred Neural Rendering and Intrinsic Image Decomposition","authors":"Hassan Abu Alhaija, Siva Karthik Mustikovela, Justus Thies, V. Jampani, M. Nießner, Andreas Geiger, C. Rother","doi":"10.1109/3DV50981.2020.00128","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00128","url":null,"abstract":"Neural rendering techniques promise efficient photorealistic image synthesis while providing rich control over scene parameters by learning the physical image formation process. While several supervised methods have been proposed for this task, acquiring a dataset of images with accurately aligned 3D models is very difficult. The main contribution of this work is to lift this restriction by training a neural rendering algorithm from unpaired data. We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties. In contrast to a traditional graphics pipeline, our approach does not require to specify all scene properties, such as material parameters and lighting by hand. Instead, we learn photo-realistic deferred rendering from a small set of 3D models and a larger set of unaligned real images, both of which are easy to acquire in practice. Simultaneously, we obtain accurate intrinsic decompositions of real images while not requiring paired ground truth. Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130002469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00013
S. Ishihara, Yuta Asano, Yinqiang Zheng, Imari Sato
This paper proposes a method of underwater depth estimation from an orthographic multispectral image. In accordance with Snell’s law, incoming light is refracted when it enters the water surface, and its directions are determined by the refractive index and the normals of the water surface. The refractive index is wavelength-dependent, and this leads to some disparity between images taken at different wavelengths. Given the camera orientation and the refractive index of a medium such as water, our approach can reconstruct the underwater scene with unknown water surface from the disparity observed in images taken at different wavelengths. We verified the effectiveness of our method through simulations and real experiments on various scenes.
{"title":"Underwater Scene Recovery Using Wavelength-Dependent Refraction of Light","authors":"S. Ishihara, Yuta Asano, Yinqiang Zheng, Imari Sato","doi":"10.1109/3DV50981.2020.00013","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00013","url":null,"abstract":"This paper proposes a method of underwater depth estimation from an orthographic multispectral image. In accordance with Snell’s law, incoming light is refracted when it enters the water surface, and its directions are determined by the refractive index and the normals of the water surface. The refractive index is wavelength-dependent, and this leads to some disparity between images taken at different wavelengths. Given the camera orientation and the refractive index of a medium such as water, our approach can reconstruct the underwater scene with unknown water surface from the disparity observed in images taken at different wavelengths. We verified the effectiveness of our method through simulations and real experiments on various scenes.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122715403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00048
M. Cassidy, J. Mélou, Y. Quéau, F. Lauze, Jean-Denis Durou
In this article we show how to extend the multi-view stereo technique when the object to be reconstructed is inside a transparent – but refractive – material, which causes distortions in the images. We provide a theoretical formulation of the problem accounting for a general, non-planar shape of the refractive interface, then a discrete solving method, which are validated by tests on synthetic and real data.
{"title":"Refractive Multi-view Stereo","authors":"M. Cassidy, J. Mélou, Y. Quéau, F. Lauze, Jean-Denis Durou","doi":"10.1109/3DV50981.2020.00048","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00048","url":null,"abstract":"In this article we show how to extend the multi-view stereo technique when the object to be reconstructed is inside a transparent – but refractive – material, which causes distortions in the images. We provide a theoretical formulation of the problem accounting for a general, non-planar shape of the refractive interface, then a discrete solving method, which are validated by tests on synthetic and real data.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115805320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00037
Minghui Yu, Jinxian Liu, Bingbing Ni, Caiyuan Li
Key to point cloud semantic segmentation is to learn discriminative representations involving of capturing effective relations among points. Many works add hard constraints on points through predefined convolution kernels. Motivated by label propagation algorithm, we develop Dynamic Adjustable Group Propagation (DAGP) with a dynamic adjustable scale module approximating distance parameter. Based on DAGP, we develop a novel Two Stage Propagation framework (TSP) to add intra-group and intergroup relation constraints on representations to enhance the discrimination of features from different group levels. We adopt well-appreciated backbone to extract features for input point cloud and then divide them into groups. DAGP is utilized to propagate information within each group in first stage. To promote information dissemination between groups more efficiently, a selection strategy is introduced to select group-pairs for second stage which propagating labels among selected group-pairs by DAGP. By training with this new learning architecture, the backbone network is enforced to mine relational context information within and between groups without introducing any extra computation burden during inference. Extensive experimental results show that TSP significantly improves the performance of existing popular architectures (PointNet, PointNet++, DGCNN) on large scene segmentation benchmarks (S3DIS, ScanNet) and part segmentation dataset ShapeNet.
{"title":"Two-Stage Relation Constraint for Semantic Segmentation of Point Clouds","authors":"Minghui Yu, Jinxian Liu, Bingbing Ni, Caiyuan Li","doi":"10.1109/3DV50981.2020.00037","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00037","url":null,"abstract":"Key to point cloud semantic segmentation is to learn discriminative representations involving of capturing effective relations among points. Many works add hard constraints on points through predefined convolution kernels. Motivated by label propagation algorithm, we develop Dynamic Adjustable Group Propagation (DAGP) with a dynamic adjustable scale module approximating distance parameter. Based on DAGP, we develop a novel Two Stage Propagation framework (TSP) to add intra-group and intergroup relation constraints on representations to enhance the discrimination of features from different group levels. We adopt well-appreciated backbone to extract features for input point cloud and then divide them into groups. DAGP is utilized to propagate information within each group in first stage. To promote information dissemination between groups more efficiently, a selection strategy is introduced to select group-pairs for second stage which propagating labels among selected group-pairs by DAGP. By training with this new learning architecture, the backbone network is enforced to mine relational context information within and between groups without introducing any extra computation burden during inference. Extensive experimental results show that TSP significantly improves the performance of existing popular architectures (PointNet, PointNet++, DGCNN) on large scene segmentation benchmarks (S3DIS, ScanNet) and part segmentation dataset ShapeNet.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115115172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00065
Dong Du, Zhiyi Zhang, Xiaoguang Han, Shuguang Cui, Ligang Liu
With the advent of deep neural networks, learning-based single-view reconstruction has gained popularity. However, in 3D, there is no absolutely dominant representation that is both computationally efficient and accurate yet allows for reconstructing high-resolution geometry of arbitrary topology. After all, the accurate implicit methods are time-consuming due to dense sampling and inference, while volumetric approaches are fast but limited to heavy memory usage and low accuracy. In this paper, we propose VIPNet, an end-to-end hybrid representation learning for fast and accurate single-view reconstruction under sparse implicit point guidance. Given an image, it first generates a volumetric result. Meanwhile, a corresponding implicit shape representation is learned. To balance the efficiency and accuracy, we adopt PointGenNet to learn some representative points for guiding the voxel refinement with the corresponding sparse implicit inference. A strategy of patch-based synthesis with global-local features under implicit guidance is also applied for reducing memory consumption required to generate high-resolution output. Extensive experiments demonstrate the effectiveness of our method both qualitatively and quantitatively, which indicates that our proposed hybrid learning outperforms separate representation learning. Specifically, our network not only runs 60 times faster than implicit methods but also contributes to accuracy gains. We hope it will inspire a re-thinking of hybrid representation learning.
{"title":"VIPNet: A Fast and Accurate Single-View Volumetric Reconstruction by Learning Sparse Implicit Point Guidance","authors":"Dong Du, Zhiyi Zhang, Xiaoguang Han, Shuguang Cui, Ligang Liu","doi":"10.1109/3DV50981.2020.00065","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00065","url":null,"abstract":"With the advent of deep neural networks, learning-based single-view reconstruction has gained popularity. However, in 3D, there is no absolutely dominant representation that is both computationally efficient and accurate yet allows for reconstructing high-resolution geometry of arbitrary topology. After all, the accurate implicit methods are time-consuming due to dense sampling and inference, while volumetric approaches are fast but limited to heavy memory usage and low accuracy. In this paper, we propose VIPNet, an end-to-end hybrid representation learning for fast and accurate single-view reconstruction under sparse implicit point guidance. Given an image, it first generates a volumetric result. Meanwhile, a corresponding implicit shape representation is learned. To balance the efficiency and accuracy, we adopt PointGenNet to learn some representative points for guiding the voxel refinement with the corresponding sparse implicit inference. A strategy of patch-based synthesis with global-local features under implicit guidance is also applied for reducing memory consumption required to generate high-resolution output. Extensive experiments demonstrate the effectiveness of our method both qualitatively and quantitatively, which indicates that our proposed hybrid learning outperforms separate representation learning. Specifically, our network not only runs 60 times faster than implicit methods but also contributes to accuracy gains. We hope it will inspire a re-thinking of hybrid representation learning.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115830389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00026
Riccardo Spezialetti, D. Tan, A. Tonioni, Keisuke Tateno, Federico Tombari
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. Most approaches regress the full object shape in a canonical pose, possibly extrapolating the occluded parts based on the learned priors. However, their viewpoint invariant technique often discards the unique structures visible from the input images. In contrast, this paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. Our approach is divided into three steps. Starting from the sparse views of the object, we first align them into a common coordinate system by estimating the relative pose between all the pairs. Then, inspired by the traditional voxel carving, we generate an occupancy grid of the object taken from the silhouette on the images and their relative poses. Finally, we refine the initial reconstruction to build a clean 3D model which preserves the details from each viewpoint. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
{"title":"A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views","authors":"Riccardo Spezialetti, D. Tan, A. Tonioni, Keisuke Tateno, Federico Tombari","doi":"10.1109/3DV50981.2020.00026","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00026","url":null,"abstract":"Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. Most approaches regress the full object shape in a canonical pose, possibly extrapolating the occluded parts based on the learned priors. However, their viewpoint invariant technique often discards the unique structures visible from the input images. In contrast, this paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. Our approach is divided into three steps. Starting from the sparse views of the object, we first align them into a common coordinate system by estimating the relative pose between all the pairs. Then, inspired by the traditional voxel carving, we generate an occupancy grid of the object taken from the silhouette on the images and their relative poses. Finally, we refine the initial reconstruction to build a clean 3D model which preserves the details from each viewpoint. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117003918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00083
Linfei Pan, L. Ladicky, M. Pollefeys
Recent progress in consumer hardware allowed for the collection of a large amount of animated point cloud data, which is on the one hand highly redundant and on the other hand incomplete. Our goal is to bridge this gap and find a low dimensional representation capable of approximation to a desired precision and completion of missing data. Model-less non-rigid 3D reconstruction algorithms, formulated as a linear factorization of observed point tracks into static shape component and dynamic pose, have been found insufficient to create suitable generative models, capable of generating new unobserved poses. This is due to the non-locality of the linear models, over-fitting to the non-causal correlations present in the data, which manifests in the reconstruction containing rigidly behaving not directly connected parts. In this paper, we propose a new method that can distinguish body parts and factorize the data into shape and pose purely using topological properties of the manifold-local deformations and neighborhoods. To obtain localized factorization, we formulate the deformation distance between two point tracks as the smallest deformation along the path between them. After embedding such distance in low dimensional space, a clustering of embedded data leads to close to rigid components, suitable as initialization for fitting a model-a skinned rigged mesh, used extensively in computer graphics. As both local deformations and neighborhoods of a point are local and can be estimated only from the part of the animation, the method can be used to recover unobserved data in each frame.
{"title":"Compression and Completion of Animated Point Clouds using Topological Properties of the Manifold","authors":"Linfei Pan, L. Ladicky, M. Pollefeys","doi":"10.1109/3DV50981.2020.00083","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00083","url":null,"abstract":"Recent progress in consumer hardware allowed for the collection of a large amount of animated point cloud data, which is on the one hand highly redundant and on the other hand incomplete. Our goal is to bridge this gap and find a low dimensional representation capable of approximation to a desired precision and completion of missing data. Model-less non-rigid 3D reconstruction algorithms, formulated as a linear factorization of observed point tracks into static shape component and dynamic pose, have been found insufficient to create suitable generative models, capable of generating new unobserved poses. This is due to the non-locality of the linear models, over-fitting to the non-causal correlations present in the data, which manifests in the reconstruction containing rigidly behaving not directly connected parts. In this paper, we propose a new method that can distinguish body parts and factorize the data into shape and pose purely using topological properties of the manifold-local deformations and neighborhoods. To obtain localized factorization, we formulate the deformation distance between two point tracks as the smallest deformation along the path between them. After embedding such distance in low dimensional space, a clustering of embedded data leads to close to rigid components, suitable as initialization for fitting a model-a skinned rigged mesh, used extensively in computer graphics. As both local deformations and neighborhoods of a point are local and can be estimated only from the part of the animation, the method can be used to recover unobserved data in each frame.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134529780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00129
Purvi Goel, L. Cohen, James Guesman, V. Thamizharasan, J. Tompkin, Daniel Ritchie
Reconstructing object geometry and material from multiple views typically requires optimization. Differentiable path tracing is an appealing framework as it can reproduce complex appearance effects. However, it is difficult to use due to high computational cost. In this paper, we explore how to use differentiable ray tracing to refine an initial coarse mesh and per-mesh-facet material representation. In simulation, we find that it is possible to reconstruct fine geometric and material detail from low resolution input views, allowing high-quality reconstructions in a few hours despite the expense of path tracing. The reconstructions successfully disambiguate shading, shadow, and global illumination effects such as diffuse interreflection from material properties. We demonstrate the impact of different geometry initializations, including space carving, multi-view stereo, and 3D neural networks. Finally, with input captured using smartphone video and a consumer 360° camera for lighting estimation, we also show how to refine initial reconstructions of real-world objects in unconstrained environments.
{"title":"Shape from Tracing: Towards Reconstructing 3D Object Geometry and SVBRDF Material from Images via Differentiable Path Tracing","authors":"Purvi Goel, L. Cohen, James Guesman, V. Thamizharasan, J. Tompkin, Daniel Ritchie","doi":"10.1109/3DV50981.2020.00129","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00129","url":null,"abstract":"Reconstructing object geometry and material from multiple views typically requires optimization. Differentiable path tracing is an appealing framework as it can reproduce complex appearance effects. However, it is difficult to use due to high computational cost. In this paper, we explore how to use differentiable ray tracing to refine an initial coarse mesh and per-mesh-facet material representation. In simulation, we find that it is possible to reconstruct fine geometric and material detail from low resolution input views, allowing high-quality reconstructions in a few hours despite the expense of path tracing. The reconstructions successfully disambiguate shading, shadow, and global illumination effects such as diffuse interreflection from material properties. We demonstrate the impact of different geometry initializations, including space carving, multi-view stereo, and 3D neural networks. Finally, with input captured using smartphone video and a consumer 360° camera for lighting estimation, we also show how to refine initial reconstructions of real-world objects in unconstrained environments.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133386374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}