In this paper we propose a pipeline for estimating 3D room layout with object and material attribute prediction using a spherical stereo image pair. We assume that the room and objects can be represented as cuboids aligned to the main axes of the room coordinate (Manhattan world). A spherical stereo alignment algorithm is proposed to align two spherical images to the global world coordinate system. Depth information of the scene is estimated by stereo matching between images. Cubic projection images of the spherical RGB and estimated depth are used for object and material attribute detection. A single Convolutional Neural Network is designed to assign object and attribute labels to geometrical elements built from the spherical image. Finally simplified room layout is reconstructed by cuboid fitting. The reconstructed cuboid-based model shows the structure of the scene with object information and material attributes.
{"title":"Room Layout Estimation with Object and Material Attributes Information Using a Spherical Camera","authors":"Hansung Kim, T. D. Campos, A. Hilton","doi":"10.1109/3DV.2016.83","DOIUrl":"https://doi.org/10.1109/3DV.2016.83","url":null,"abstract":"In this paper we propose a pipeline for estimating 3D room layout with object and material attribute prediction using a spherical stereo image pair. We assume that the room and objects can be represented as cuboids aligned to the main axes of the room coordinate (Manhattan world). A spherical stereo alignment algorithm is proposed to align two spherical images to the global world coordinate system. Depth information of the scene is estimated by stereo matching between images. Cubic projection images of the spherical RGB and estimated depth are used for object and material attribute detection. A single Convolutional Neural Network is designed to assign object and attribute labels to geometrical elements built from the spherical image. Finally simplified room layout is reconstructed by cuboid fitting. The reconstructed cuboid-based model shows the structure of the scene with object information and material attributes.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126633493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Zienkiewicz, Akis Tsiotsios, A. Davison, Stefan Leutenegger
We present a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. As a monocular camera browses a scene, our algorithm processes images as they arrive and incrementally builds a detailed surface model.While most of the existing reconstruction approaches rely on volumetric or point-cloud representations of the environment, we perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method relies on least-squares optimisation, which enables a probabilistically sound and principled formulation of the fusion algorithm.We demonstrate that our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient.
{"title":"Monocular, Real-Time Surface Reconstruction Using Dynamic Level of Detail","authors":"J. Zienkiewicz, Akis Tsiotsios, A. Davison, Stefan Leutenegger","doi":"10.1109/3DV.2016.82","DOIUrl":"https://doi.org/10.1109/3DV.2016.82","url":null,"abstract":"We present a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. As a monocular camera browses a scene, our algorithm processes images as they arrive and incrementally builds a detailed surface model.While most of the existing reconstruction approaches rely on volumetric or point-cloud representations of the environment, we perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method relies on least-squares optimisation, which enables a probabilistically sound and principled formulation of the fusion algorithm.We demonstrate that our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124225126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liming Yang, Hideaki Uchiyama, Jean-Marie Normand, G. Moreau, H. Nagahara, R. Taniguchi
We present a fast and accurate method for reconstructing surfaces of revolution (SoR) on 3D data and its application to structural modeling of a cluttered scene in real-time. To estimate a SoR axis, we derive an approximately linear cost function for fast convergence. Also, we design a framework for reconstructing SoR on dense SLAM. In the experiment results, we show our method is accurate, robust to noise and runs in real-time.
{"title":"Real-Time Surface of Revolution Reconstruction on Dense SLAM","authors":"Liming Yang, Hideaki Uchiyama, Jean-Marie Normand, G. Moreau, H. Nagahara, R. Taniguchi","doi":"10.1109/3DV.2016.13","DOIUrl":"https://doi.org/10.1109/3DV.2016.13","url":null,"abstract":"We present a fast and accurate method for reconstructing surfaces of revolution (SoR) on 3D data and its application to structural modeling of a cluttered scene in real-time. To estimate a SoR axis, we derive an approximately linear cost function for fast convergence. Also, we design a framework for reconstructing SoR on dense SLAM. In the experiment results, we show our method is accurate, robust to noise and runs in real-time.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132596217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vahid Soleimani, M. Mirmehdi, D. Damen, S. Hannuna, M. Camplani
We present an automatic, open source data acquisition and calibration approach using two opposing RGBD sensors (Kinect V2) and demonstrate its efficacy for dynamic object reconstruction in the context of monitoring for remote lung function assessment. First, the relative pose of the two RGBD sensors is estimated through a calibration stage and rigid transformation parameters are computed. These are then used to align and register point clouds obtained from the sensors at frame level. We validated the proposed system by performing experiments on known-size box objects with the results demonstrating accurate measurements. We also report on dynamic object reconstruction by way of human subjects undergoing respiratory functional assessment.
{"title":"3D Data Acquisition and Registration Using Two Opposing Kinects","authors":"Vahid Soleimani, M. Mirmehdi, D. Damen, S. Hannuna, M. Camplani","doi":"10.1109/3DV.2016.21","DOIUrl":"https://doi.org/10.1109/3DV.2016.21","url":null,"abstract":"We present an automatic, open source data acquisition and calibration approach using two opposing RGBD sensors (Kinect V2) and demonstrate its efficacy for dynamic object reconstruction in the context of monitoring for remote lung function assessment. First, the relative pose of the two RGBD sensors is estimated through a calibration stage and rigid transformation parameters are computed. These are then used to align and register point clouds obtained from the sensors at frame level. We validated the proposed system by performing experiments on known-size box objects with the results demonstrating accurate measurements. We also report on dynamic object reconstruction by way of human subjects undergoing respiratory functional assessment.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134313143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdelaziz Djelouah, Jean-Sébastien Franco, Edmond Boyer, P. Pérez, G. Drettakis
We address the problem of multi-view video segmentation of dynamic scenes in general and outdoor environments with possibly moving cameras. Multi-view methods for dynamic scenes usually rely on geometric calibration to impose spatial shape constraints between viewpoints. In this paper, we show that the calibration constraint can be relaxed while still getting competitive segmentation results using multi-view constraints. We introduce new multi-view cotemporality constraints through motion correlation cues, in addition to common appearance features used by co-segmentation methods to identify co-instances of objects. We also take advantage of learning based segmentation strategies by casting the problem as the selection of monocular proposals that satisfy multi-view constraints. This yields a fully automated method that can segment subjects of interest without any particular pre-processing stage. Results on several challenging outdoor datasets demonstrate the feasibility and robustness of our approach.
{"title":"Cotemporal Multi-View Video Segmentation","authors":"Abdelaziz Djelouah, Jean-Sébastien Franco, Edmond Boyer, P. Pérez, G. Drettakis","doi":"10.1109/3DV.2016.45","DOIUrl":"https://doi.org/10.1109/3DV.2016.45","url":null,"abstract":"We address the problem of multi-view video segmentation of dynamic scenes in general and outdoor environments with possibly moving cameras. Multi-view methods for dynamic scenes usually rely on geometric calibration to impose spatial shape constraints between viewpoints. In this paper, we show that the calibration constraint can be relaxed while still getting competitive segmentation results using multi-view constraints. We introduce new multi-view cotemporality constraints through motion correlation cues, in addition to common appearance features used by co-segmentation methods to identify co-instances of objects. We also take advantage of learning based segmentation strategies by casting the problem as the selection of monocular proposals that satisfy multi-view constraints. This yields a fully automated method that can segment subjects of interest without any particular pre-processing stage. Results on several challenging outdoor datasets demonstrate the feasibility and robustness of our approach.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"13 4-5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123731845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aurela Shehu, Jinlong Yang, Jean-Sébastien Franco, Franck Hétroy-Wheeler, S. Wuhrer
In this paper, we address the problem of temporal alignment of surfaces for subjects dressed in wide clothing, as acquired by calibrated multi-camera systems. Most existing methods solve the alignment by fitting a single surface template to each instant's 3D observations, relying on a dense point-to-point correspondence scheme, e.g. by matching individual surface points based on local geometric features or proximity. The wide clothing situation yields more geometric and topological difficulties in observed sequences, such as apparent merging of surface components, misreconstructions, and partial surface observation, resulting in overly sparse, erroneous point-to-point correspondences, and thus alignment failures. To resolve these issues, we propose an alignment framework where point-to-point correspondences are obtained by growing isometric patches from a set of reliably obtained body landmarks. This correspondence decreases the reliance on local geometric features subject to instability, instead emphasizing the surface neighborhood coherence of matches, while improving density given sufficient landmark coverage. We validate and verify the resulting improved alignment performance in our experiments.
{"title":"Computing Temporal Alignments of Human Motion Sequences in Wide Clothing Using Geodesic Patches","authors":"Aurela Shehu, Jinlong Yang, Jean-Sébastien Franco, Franck Hétroy-Wheeler, S. Wuhrer","doi":"10.1109/3DV.2016.27","DOIUrl":"https://doi.org/10.1109/3DV.2016.27","url":null,"abstract":"In this paper, we address the problem of temporal alignment of surfaces for subjects dressed in wide clothing, as acquired by calibrated multi-camera systems. Most existing methods solve the alignment by fitting a single surface template to each instant's 3D observations, relying on a dense point-to-point correspondence scheme, e.g. by matching individual surface points based on local geometric features or proximity. The wide clothing situation yields more geometric and topological difficulties in observed sequences, such as apparent merging of surface components, misreconstructions, and partial surface observation, resulting in overly sparse, erroneous point-to-point correspondences, and thus alignment failures. To resolve these issues, we propose an alignment framework where point-to-point correspondences are obtained by growing isometric patches from a set of reliably obtained body landmarks. This correspondence decreases the reliance on local geometric features subject to instability, instead emphasizing the surface neighborhood coherence of matches, while improving density given sufficient landmark coverage. We validate and verify the resulting improved alignment performance in our experiments.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130650657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a method for regularizing noisy 3D reconstructions, which is especially well suited for scenes containing planar structures like buildings. At horizontal structures, the input model is divided into slices and for each slice, an inside/outside labeling is computed. With the outlines of each slice labeling, we create an irregularly shaped volumetric cell decomposition of the whole scene. Then, an optimized inside/outside labeling of these cells is computed by solving an energy minimization problem. For the cell labeling optimization we introduce a novel smoothness term, where lines in the images are used to improve the regularization result. We show that our approach can take arbitrary dense meshed point clouds as input and delivers well regularized building models, which can be textured afterwards.
{"title":"Regularized 3D Modeling from Noisy Building Reconstructions","authors":"Thomas Holzmann, F. Fraundorfer, H. Bischof","doi":"10.1109/3DV.2016.62","DOIUrl":"https://doi.org/10.1109/3DV.2016.62","url":null,"abstract":"In this paper, we present a method for regularizing noisy 3D reconstructions, which is especially well suited for scenes containing planar structures like buildings. At horizontal structures, the input model is divided into slices and for each slice, an inside/outside labeling is computed. With the outlines of each slice labeling, we create an irregularly shaped volumetric cell decomposition of the whole scene. Then, an optimized inside/outside labeling of these cells is computed by solving an energy minimization problem. For the cell labeling optimization we introduce a novel smoothness term, where lines in the images are used to improve the regularization result. We show that our approach can take arbitrary dense meshed point clouds as input and delivers well regularized building models, which can be textured afterwards.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125120728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Thies, M. Zollhöfer, Christian Richardt, C. Theobalt, G. Greiner
We present a novel approach for real-time joint reconstruction of 3D scene motion and geometry from binocular stereo videos. Our approach is based on a novel variational halfway-domain scene flow formulation, which allows us to obtain highly accurate spatiotemporal reconstructions of shape and motion. We solve the underlying optimization problem at real-time frame rates using a novel data-parallel robust non-linear optimization strategy. Fast convergence and large displacement flows are achieved by employing a novel hierarchy that stores delta flows between hierarchy levels. High performance is obtained by the introduction of a coarser warp grid that decouples the number of unknowns from the input resolution of the images. We demonstrate our approach in a live setup that is based on two commodity webcams, as well as on publicly available video data. Our extensive experiments and evaluations show that our approach produces high-quality dense reconstructions of 3D geometry and scene flow at real-time frame rates, and compares favorably to the state of the art.
{"title":"Real-Time Halfway Domain Reconstruction of Motion and Geometry","authors":"Lucas Thies, M. Zollhöfer, Christian Richardt, C. Theobalt, G. Greiner","doi":"10.1109/3DV.2016.55","DOIUrl":"https://doi.org/10.1109/3DV.2016.55","url":null,"abstract":"We present a novel approach for real-time joint reconstruction of 3D scene motion and geometry from binocular stereo videos. Our approach is based on a novel variational halfway-domain scene flow formulation, which allows us to obtain highly accurate spatiotemporal reconstructions of shape and motion. We solve the underlying optimization problem at real-time frame rates using a novel data-parallel robust non-linear optimization strategy. Fast convergence and large displacement flows are achieved by employing a novel hierarchy that stores delta flows between hierarchy levels. High performance is obtained by the introduction of a coarser warp grid that decouples the number of unknowns from the input resolution of the images. We demonstrate our approach in a live setup that is based on two commodity webcams, as well as on publicly available video data. Our extensive experiments and evaluations show that our approach produces high-quality dense reconstructions of 3D geometry and scene flow at real-time frame rates, and compares favorably to the state of the art.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129215349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nobuhara, T. Kashino, T. Matsuyama, Kouta Takeuchi, K. Fujii
This paper is aimed at presenting a new algorithm for multi-path interference resolutions under mirror-based full 3D capture using a single correlation-based ToF camera. Our algorithm does not require additional captures or device modifications, and resolves the interference using a single ToF sensing that is also used for the 3D reconstruction as well. Evaluations with real images prove the concept of the proposed algorithm qualitatively and quantitatively.
{"title":"A Single-Shot Multi-Path Interference Resolution for Mirror-Based Full 3D Shape Measurement with a Correlation-Based ToF Camera","authors":"S. Nobuhara, T. Kashino, T. Matsuyama, Kouta Takeuchi, K. Fujii","doi":"10.1109/3DV.2016.43","DOIUrl":"https://doi.org/10.1109/3DV.2016.43","url":null,"abstract":"This paper is aimed at presenting a new algorithm for multi-path interference resolutions under mirror-based full 3D capture using a single correlation-based ToF camera. Our algorithm does not require additional captures or device modifications, and resolves the interference using a single ToF sensing that is also used for the 3D reconstruction as well. Evaluations with real images prove the concept of the proposed algorithm qualitatively and quantitatively.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115065811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inverse procedural modeling discovers a procedural representation of an existing geometric model and the discovered procedural model then supports synthesizing new similar models. We introduce an automatic approach that generates a compact, efficient, and re-usable procedural representation of a polygonal 3D architectural model. This representation is then used for structure-aware editing and synthesis of new geometric models that resemble the original. Our framework captures the pattern hierarchy of the input model into a split tree data representation. A context-free split grammar, supporting a hierarchical nesting of procedural rules, is extracted from the tree, which establishes the base of our interactive procedural editing engine. We show the application of our approach to a variety of architectural structures obtained by procedurally editing web-sourced models. The grammar generation takes a few minutes even for the most complex input and synthesis is fully interactive for buildings composed of up to 200k polygons.
{"title":"Proceduralization for Editing 3D Architectural Models","authors":"Ilke Demir, Daniel G. Aliaga, Bedrich Benes","doi":"10.1109/3DV.2016.28","DOIUrl":"https://doi.org/10.1109/3DV.2016.28","url":null,"abstract":"Inverse procedural modeling discovers a procedural representation of an existing geometric model and the discovered procedural model then supports synthesizing new similar models. We introduce an automatic approach that generates a compact, efficient, and re-usable procedural representation of a polygonal 3D architectural model. This representation is then used for structure-aware editing and synthesis of new geometric models that resemble the original. Our framework captures the pattern hierarchy of the input model into a split tree data representation. A context-free split grammar, supporting a hierarchical nesting of procedural rules, is extracted from the tree, which establishes the base of our interactive procedural editing engine. We show the application of our approach to a variety of architectural structures obtained by procedurally editing web-sourced models. The grammar generation takes a few minutes even for the most complex input and synthesis is fully interactive for buildings composed of up to 200k polygons.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125977711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}