We introduce a new system for automatic discovery of high-level structural representations of building facades. Under the assumption that each facade can be represented as a hierarchy of rectilinear subdivisions, our goal is to find the optimal direction of splitting, along with the number and positions of the split lines at each level of the tree. Unlike previous approaches, where each facade is analysed in isolation, we propose a joint analysis of a set of facade images. Initially, a co-segmentation approach is used to produce consistent decompositions across all facade images. Afterwards, a clustering step identifies semantically similar segments. Each cluster of similar segments is then used as the input for the joint segmentation in the next level of the hierarchy. We show that our approach produces consistent hierarchical segmentations on two different facade datasets. Furthermore, we argue that the discovered hierarchies capture essential structural information, which is demonstrated on the tasks of facade retrieval and virtual facade synthesis.
{"title":"Hierarchical Co-Segmentation of Building Facades","authors":"Andelo Martinovic, L. Gool","doi":"10.1109/3DV.2014.26","DOIUrl":"https://doi.org/10.1109/3DV.2014.26","url":null,"abstract":"We introduce a new system for automatic discovery of high-level structural representations of building facades. Under the assumption that each facade can be represented as a hierarchy of rectilinear subdivisions, our goal is to find the optimal direction of splitting, along with the number and positions of the split lines at each level of the tree. Unlike previous approaches, where each facade is analysed in isolation, we propose a joint analysis of a set of facade images. Initially, a co-segmentation approach is used to produce consistent decompositions across all facade images. Afterwards, a clustering step identifies semantically similar segments. Each cluster of similar segments is then used as the input for the joint segmentation in the next level of the hierarchy. We show that our approach produces consistent hierarchical segmentations on two different facade datasets. Furthermore, we argue that the discovered hierarchies capture essential structural information, which is demonstrated on the tasks of facade retrieval and virtual facade synthesis.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115498347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a framework for the conversion of existing 3D unstructured urban models into a compact procedural representation that enables model synthesis, querying, and simplification of large urban areas. During the de-instancing phase, a dissimilarity-based clustering is performed to obtain a set of building components and component types. During the proceduralization phase, the components are arranged into a context-free grammar, which can be directly edited or interactively manipulated. We applied our approach to convert several large city models, with up to 19,000 building components spanning over 180 km squares, into procedural models of a few thousand terminals, non-terminals, and 50-100 rules.
{"title":"Proceduralization of Buildings at City Scale","authors":"Ilke Demir, Daniel G. Aliaga, Bedrich Benes","doi":"10.1109/3DV.2014.31","DOIUrl":"https://doi.org/10.1109/3DV.2014.31","url":null,"abstract":"We present a framework for the conversion of existing 3D unstructured urban models into a compact procedural representation that enables model synthesis, querying, and simplification of large urban areas. During the de-instancing phase, a dissimilarity-based clustering is performed to obtain a set of building components and component types. During the proceduralization phase, the components are arranged into a context-free grammar, which can be directly edited or interactively manipulated. We applied our approach to convert several large city models, with up to 19,000 building components spanning over 180 km squares, into procedural models of a few thousand terminals, non-terminals, and 50-100 rules.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114592889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper revisits variational multi-view stereo and identifies two issues pertaining to matching and view merging: i) regions with low visibility and relatively high depth variation are only resolved by the sole regularizer contribution. This often induces wrong matches which tend to bleed into neigh boring regions, and more importantly distort nearby features. ii) small matching errors can lead to overlapping surface layers which cannot be easily addressed by standard outlier removal techniques. In both scenarios, we rely on the analysis of the distortion of spatial and planar maps in order to improve the quality of the reconstruction. At the matching level, an anisotropic diffusion driven by spatial grid distortion is proposed to steer grid lines away from those problematic regions. At the merging level, advantage is taken of Lambert's cosine law to favor contributions from image areas where the cosine angle between the surface normal and the line of sight is maximal. Tests on standard benchmarks suggest a good blend between computational efficiency, ease of implementation, and reconstruction quality.
{"title":"Distortion Driven Variational Multi-view Reconstruction","authors":"Patricio A. Galindo, Rhaleb Zayer","doi":"10.1109/3DV.2014.99","DOIUrl":"https://doi.org/10.1109/3DV.2014.99","url":null,"abstract":"This paper revisits variational multi-view stereo and identifies two issues pertaining to matching and view merging: i) regions with low visibility and relatively high depth variation are only resolved by the sole regularizer contribution. This often induces wrong matches which tend to bleed into neigh boring regions, and more importantly distort nearby features. ii) small matching errors can lead to overlapping surface layers which cannot be easily addressed by standard outlier removal techniques. In both scenarios, we rely on the analysis of the distortion of spatial and planar maps in order to improve the quality of the reconstruction. At the matching level, an anisotropic diffusion driven by spatial grid distortion is proposed to steer grid lines away from those problematic regions. At the merging level, advantage is taken of Lambert's cosine law to favor contributions from image areas where the cosine angle between the surface normal and the line of sight is maximal. Tests on standard benchmarks suggest a good blend between computational efficiency, ease of implementation, and reconstruction quality.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114885141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel shape descriptor for topology-based segmentation of 3D video sequence. 3D video is a series of 3D meshes without temporal correspondences which benefit for applications including compression, motion analysis, and kinematic editing. In 3D video, both 3D mesh connectivities and the global surface topology can change frame by frame. This characteristic prevents from making accurate temporal correspondences through the entire 3D mesh series. To overcome this difficulty, we propose a two-step strategy which decomposes the entire sequence into a series of topologically coherent segments using our new shape descriptor, and then estimates temporal correspondences on a per-segment basis. We demonstrate the robustness and accuracy of the shape descriptor on real data which consist of large non-rigid motion and reconstruction errors.
{"title":"A 3D Shape Descriptor for Segmentation of Unstructured Meshes into Segment-Wise Coherent Mesh Series","authors":"T. Mukasa, S. Nobuhara, Tony Tung, T. Matsuyama","doi":"10.1109/3DV.2014.20","DOIUrl":"https://doi.org/10.1109/3DV.2014.20","url":null,"abstract":"This paper presents a novel shape descriptor for topology-based segmentation of 3D video sequence. 3D video is a series of 3D meshes without temporal correspondences which benefit for applications including compression, motion analysis, and kinematic editing. In 3D video, both 3D mesh connectivities and the global surface topology can change frame by frame. This characteristic prevents from making accurate temporal correspondences through the entire 3D mesh series. To overcome this difficulty, we propose a two-step strategy which decomposes the entire sequence into a series of topologically coherent segments using our new shape descriptor, and then estimates temporal correspondences on a per-segment basis. We demonstrate the robustness and accuracy of the shape descriptor on real data which consist of large non-rigid motion and reconstruction errors.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhongyu Li, Jihua Zhu, Ke Lan, Chen Li, Chaowei Fang
Recently, motion averaging has been introduced as an effective means to solve multi-view registration problem. This approach utilizes the Lie-algebras to implement the averaging of many relative motions, each of which corresponds to the registration result of the scan pair involved in multi-view registration. Accordingly, a key question is how to obtain accurate registration between two partially overlapping scans. This paper presents a method to estimate the overlapping percentage between each scan pair involved in multi-view registration. What's more, it applies the trimmed iterative closest point (TrICP) algorithm to obtain accurate relative motions for the scan pairs including high overlapping percentage. Besides, it introduces the parallel computation to increase the efficiency of multi-view registration. Experimental results carried out with public data sets illustrate its superiority over previous approaches.
{"title":"Improved Techniques for Multi-view Registration with Motion Averaging","authors":"Zhongyu Li, Jihua Zhu, Ke Lan, Chen Li, Chaowei Fang","doi":"10.1109/3DV.2014.23","DOIUrl":"https://doi.org/10.1109/3DV.2014.23","url":null,"abstract":"Recently, motion averaging has been introduced as an effective means to solve multi-view registration problem. This approach utilizes the Lie-algebras to implement the averaging of many relative motions, each of which corresponds to the registration result of the scan pair involved in multi-view registration. Accordingly, a key question is how to obtain accurate registration between two partially overlapping scans. This paper presents a method to estimate the overlapping percentage between each scan pair involved in multi-view registration. What's more, it applies the trimmed iterative closest point (TrICP) algorithm to obtain accurate relative motions for the scan pairs including high overlapping percentage. Besides, it introduces the parallel computation to increase the efficiency of multi-view registration. Experimental results carried out with public data sets illustrate its superiority over previous approaches.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124618611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important operation in geometry processing is finding the correspondences between pairs of shapes. Measures of dissimilarity between surfaces, has been found to be highly useful for nonrigid shape comparison. Here, we analyze the applicability of the spectral kernel distance, for solving the shape matching problem. To align the spectral kernels, we introduce the iterative closest spectral kernel maps (ICSKM) algorithm. The ICSKM algorithm farther extends the iterative closest point algorithm to the class of deformable shapes. The proposed method achieves state-of-the-art results on the Princeton isometric shape matching protocol applied, as usual, to the TOSCA and SCAPE benchmarks.
{"title":"Iterative Closest Spectral Kernel Maps","authors":"A. Shtern, R. Kimmel","doi":"10.1109/3DV.2014.24","DOIUrl":"https://doi.org/10.1109/3DV.2014.24","url":null,"abstract":"An important operation in geometry processing is finding the correspondences between pairs of shapes. Measures of dissimilarity between surfaces, has been found to be highly useful for nonrigid shape comparison. Here, we analyze the applicability of the spectral kernel distance, for solving the shape matching problem. To align the spectral kernels, we introduce the iterative closest spectral kernel maps (ICSKM) algorithm. The ICSKM algorithm farther extends the iterative closest point algorithm to the class of deformable shapes. The proposed method achieves state-of-the-art results on the Princeton isometric shape matching protocol applied, as usual, to the TOSCA and SCAPE benchmarks.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114395015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Kuhn, H. Mayer, H. Hirschmüller, D. Scharstein
Local fusion of disparity maps allows fast parallel 3D modeling of large scenes that do not fit into main memory. While existing methods assume a constant disparity uncertainty, disparity errors typically vary spatially from tenths of pixels to several pixels. In this paper we propose a method that employs a set of Gaussians for different disparity classes, instead of a single error model with only one variance. The set of Gaussians is learned from the difference between generated disparity maps and ground-truth disparities. Pixels are assigned particular disparity classes based on a Total Variation (TV) feature measuring the local oscillation behavior of the 2D disparity map. This feature captures uncertainty caused for instance by lack of texture or fronto-parallel bias of the stereo method. Experimental results on several datasets in varying configurations demonstrate that our method yields improved performance both qualitatively and quantitatively.
{"title":"A TV Prior for High-Quality Local Multi-view Stereo Reconstruction","authors":"Andreas Kuhn, H. Mayer, H. Hirschmüller, D. Scharstein","doi":"10.1109/3DV.2014.76","DOIUrl":"https://doi.org/10.1109/3DV.2014.76","url":null,"abstract":"Local fusion of disparity maps allows fast parallel 3D modeling of large scenes that do not fit into main memory. While existing methods assume a constant disparity uncertainty, disparity errors typically vary spatially from tenths of pixels to several pixels. In this paper we propose a method that employs a set of Gaussians for different disparity classes, instead of a single error model with only one variance. The set of Gaussians is learned from the difference between generated disparity maps and ground-truth disparities. Pixels are assigned particular disparity classes based on a Total Variation (TV) feature measuring the local oscillation behavior of the 2D disparity map. This feature captures uncertainty caused for instance by lack of texture or fronto-parallel bias of the stereo method. Experimental results on several datasets in varying configurations demonstrate that our method yields improved performance both qualitatively and quantitatively.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130055076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
View synthesis is the process of combining given multi-view images to generate an image from a new viewpoint. Assuming that each pixel of the new view is obtained as the weighted sum of the corresponding pixels from the input views, we focus on the problem of how to optimize the weight for each of the input views. Our weighting method is called least mean squared error (MSE) regression because it is formulated as a regression problem in which second order statistics among the viewpoints are exploited to minimize the MSE of the resulting image. More specifically, the affinity across the viewpoints is represented as a covariance and approximated using a linear model whose parameters are adapted for each dataset. By using the approximated covariance, the optimal weights can be successfully estimated. As a result, the weights derived using our method are data dependent and significantly differ from those obtained using current empirical methods such as distance penalty. Our method is still effective if the given correspondence is not completely accurate due to noise. We report on experimental results using several multi-view datasets to validate our theory and method.
{"title":"Least MSE Regression for View Synthesis","authors":"Keita Takahashi, T. Fujii","doi":"10.1109/3DV.2014.29","DOIUrl":"https://doi.org/10.1109/3DV.2014.29","url":null,"abstract":"View synthesis is the process of combining given multi-view images to generate an image from a new viewpoint. Assuming that each pixel of the new view is obtained as the weighted sum of the corresponding pixels from the input views, we focus on the problem of how to optimize the weight for each of the input views. Our weighting method is called least mean squared error (MSE) regression because it is formulated as a regression problem in which second order statistics among the viewpoints are exploited to minimize the MSE of the resulting image. More specifically, the affinity across the viewpoints is represented as a covariance and approximated using a linear model whose parameters are adapted for each dataset. By using the approximated covariance, the optimal weights can be successfully estimated. As a result, the weights derived using our method are data dependent and significantly differ from those obtained using current empirical methods such as distance penalty. Our method is still effective if the given correspondence is not completely accurate due to noise. We report on experimental results using several multi-view datasets to validate our theory and method.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121113194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an efficient multi-view 3D reconstruction method based on randomization and propagation scheme. Our method progressively refines 3D point estimates by randomly perturbing the initial guess of 3D points and propagates photo-consistent ones to their neighbors. In contrast to previous refinement methods that perform local optimization for a better photo-consistency, our randomization approach takes lucky matchings for reducing the computational complexity. Experiments show favorable efficiency of the proposed method with the accuracy that is close to the state-of-the-art methods.
{"title":"Efficient Multiview Stereo by Random-Search and Propagation","authors":"Youngjung Uh, Y. Matsushita, H. Byun","doi":"10.1109/3DV.2014.35","DOIUrl":"https://doi.org/10.1109/3DV.2014.35","url":null,"abstract":"We present an efficient multi-view 3D reconstruction method based on randomization and propagation scheme. Our method progressively refines 3D point estimates by randomly perturbing the initial guess of 3D points and propagates photo-consistent ones to their neighbors. In contrast to previous refinement methods that perform local optimization for a better photo-consistency, our randomization approach takes lucky matchings for reducing the computational complexity. Experiments show favorable efficiency of the proposed method with the accuracy that is close to the state-of-the-art methods.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"39 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120883729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Structure-from-motion (SFM) is widely utilized to generate 3D reconstructions from unordered photo-collections. However, in the presence of non unique, symmetric, or otherwise indistinguishable structure, SFM techniques often incorrectly reconstruct the final model. We propose a method that not only determines if an error is present, but automatically corrects the error in order to produce a correct representation of the scene. We find that by exploiting the co-occurrence information present in the scene's geometry, we can successfully isolate the 3D points causing the incorrect result. This allows us to split an incorrect reconstruction into error-free sub-models that we then correctly merge back together. Our experimental results show that our technique is efficient, robust to a variety of scenes, and outperforms existing methods.
{"title":"Recovering Correct Reconstructions from Indistinguishable Geometry","authors":"Jared Heinly, Enrique Dunn, Jan-Michael Frahm","doi":"10.1109/3DV.2014.84","DOIUrl":"https://doi.org/10.1109/3DV.2014.84","url":null,"abstract":"Structure-from-motion (SFM) is widely utilized to generate 3D reconstructions from unordered photo-collections. However, in the presence of non unique, symmetric, or otherwise indistinguishable structure, SFM techniques often incorrectly reconstruct the final model. We propose a method that not only determines if an error is present, but automatically corrects the error in order to produce a correct representation of the scene. We find that by exploiting the co-occurrence information present in the scene's geometry, we can successfully isolate the 3D points causing the incorrect result. This allows us to split an incorrect reconstruction into error-free sub-models that we then correctly merge back together. Our experimental results show that our technique is efficient, robust to a variety of scenes, and outperforms existing methods.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115279514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}