We present a data driven approach to holistic scene understanding. From a single image of an indoor scene, our approach estimates its detailed 3D geometry, i.e. The location of its walls and floor, and the 3D appearance of its containing objects, as well as its semantic meaning, i.e. A prediction of what objects it contains. This is made possible by using large datasets of detailed 3D models alongside appearance based detectors. We first estimate the 3D layout of a room, and extrapolate 2D object detection hypotheses to three dimensions to form bounding cuboids. Cuboids are converted to detailed 3D models of the predicted semantic category. Combinations of 3D models are used to create a large list of layout hypotheses for each image -- where each layout hypothesis is semantically meaningful and geometrically plausible. The likelihood of each layout hypothesis is ranked using a learned linear model -- and the hypothesis with the highest predicted likelihood is the final predicted 3D layout. Our approach is able to recover the detailed geometry of scenes, provide precise segmentation of objects in the image plane, and estimate objects' pose in 3D.
{"title":"Detailed 3D Model Driven Single View Scene Understanding","authors":"M. Rashid, M. Hebert","doi":"10.1109/3DV.2014.32","DOIUrl":"https://doi.org/10.1109/3DV.2014.32","url":null,"abstract":"We present a data driven approach to holistic scene understanding. From a single image of an indoor scene, our approach estimates its detailed 3D geometry, i.e. The location of its walls and floor, and the 3D appearance of its containing objects, as well as its semantic meaning, i.e. A prediction of what objects it contains. This is made possible by using large datasets of detailed 3D models alongside appearance based detectors. We first estimate the 3D layout of a room, and extrapolate 2D object detection hypotheses to three dimensions to form bounding cuboids. Cuboids are converted to detailed 3D models of the predicted semantic category. Combinations of 3D models are used to create a large list of layout hypotheses for each image -- where each layout hypothesis is semantically meaningful and geometrically plausible. The likelihood of each layout hypothesis is ranked using a learned linear model -- and the hypothesis with the highest predicted likelihood is the final predicted 3D layout. Our approach is able to recover the detailed geometry of scenes, provide precise segmentation of objects in the image plane, and estimate objects' pose in 3D.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"31 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131452590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel formulation of minimal case solutions for determining the relative pose of perspective and generalized cameras given a partially known rotation, namely, a known axis of rotation. An axis of rotation may be easily obtained by detecting vertical vanishing points with computer vision techniques, or with the aid of sensor measurements from a smart phone. Given a known axis of rotation, our algorithms solve for the angle of rotation around the known axis along with the unknown translation. We formulate these relative pose problems as Quadratic Eigen value Problems which are very simple to construct. We run several experiments on synthetic and real data to compare our methods to the current state-of-the-art algorithms. Our methods provide several advantages over alternatives methods, including efficiency and accuracy, particularly in the presence of image and sensor noise as is often the case for mobile devices.
{"title":"Solving for Relative Pose with a Partially Known Rotation is a Quadratic Eigenvalue Problem","authors":"Chris Sweeney, John Flynn, M. Turk","doi":"10.1109/3DV.2014.66","DOIUrl":"https://doi.org/10.1109/3DV.2014.66","url":null,"abstract":"We propose a novel formulation of minimal case solutions for determining the relative pose of perspective and generalized cameras given a partially known rotation, namely, a known axis of rotation. An axis of rotation may be easily obtained by detecting vertical vanishing points with computer vision techniques, or with the aid of sensor measurements from a smart phone. Given a known axis of rotation, our algorithms solve for the angle of rotation around the known axis along with the unknown translation. We formulate these relative pose problems as Quadratic Eigen value Problems which are very simple to construct. We run several experiments on synthetic and real data to compare our methods to the current state-of-the-art algorithms. Our methods provide several advantages over alternatives methods, including efficiency and accuracy, particularly in the presence of image and sensor noise as is often the case for mobile devices.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130202277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose an optimization method for estimating the parameters that typically appear in graph-theoretical formulations of the matching problem for object detection. Although several methods have been proposed to optimize parameters for graph matching in a way to promote correct correspondences and to restrict wrong ones, our approach is novel in the sense that it aims at improving performance in the more general task of object detection. In our formulation, similarity functions are adjusted so as to increase the overall similarity among a reference model and the observed target, and at the same time reduce the similarity among reference and "non-target" objects. We evaluate the proposed method in two challenging scenarios, namely object detection using data captured with a Kinect sensor in a real environment, and intrinsic metric learning for deformable shapes, demonstrating substantial improvements in both settings.
{"title":"Learning Similarities for Rigid and Non-rigid Object Detection","authors":"Asako Kanezaki, E. Rodolà, D. Cremers, T. Harada","doi":"10.1109/3DV.2014.61","DOIUrl":"https://doi.org/10.1109/3DV.2014.61","url":null,"abstract":"In this paper, we propose an optimization method for estimating the parameters that typically appear in graph-theoretical formulations of the matching problem for object detection. Although several methods have been proposed to optimize parameters for graph matching in a way to promote correct correspondences and to restrict wrong ones, our approach is novel in the sense that it aims at improving performance in the more general task of object detection. In our formulation, similarity functions are adjusted so as to increase the overall similarity among a reference model and the observed target, and at the same time reduce the similarity among reference and \"non-target\" objects. We evaluate the proposed method in two challenging scenarios, namely object detection using data captured with a Kinect sensor in a real environment, and intrinsic metric learning for deformable shapes, demonstrating substantial improvements in both settings.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132776087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the current transition of various digital contents from 2D to 3D, the problem of 3D data matching and registration is increasingly important. Registration of multi-modal 3D data acquired from different sensors remains a challenging problem due to the difference in types and characteristics of the data. In this paper, we evaluate the registration performance of 3D feature descriptors with different domains on datasets from various environments and modalities. Datasets are acquired in indoor and outdoor environments with 2D and 3D sensing devices including LIDAR, spherical imaging, digital camera and RGBD camera. FPFH, PFH and SHOT feature descriptors are applied to the 3D point clouds generated from the multi-modal datasets. Local neighbouring point distribution, key points distribution, colour information and their combinations are used for feature description. Finally we analyse their influences on the multi-modal 3D point clouds data registration.
{"title":"Influence of Colour and Feature Geometry on Multi-modal 3D Point Clouds Data Registration","authors":"Hansung Kim, A. Hilton","doi":"10.1109/3DV.2014.51","DOIUrl":"https://doi.org/10.1109/3DV.2014.51","url":null,"abstract":"With the current transition of various digital contents from 2D to 3D, the problem of 3D data matching and registration is increasingly important. Registration of multi-modal 3D data acquired from different sensors remains a challenging problem due to the difference in types and characteristics of the data. In this paper, we evaluate the registration performance of 3D feature descriptors with different domains on datasets from various environments and modalities. Datasets are acquired in indoor and outdoor environments with 2D and 3D sensing devices including LIDAR, spherical imaging, digital camera and RGBD camera. FPFH, PFH and SHOT feature descriptors are applied to the 3D point clouds generated from the multi-modal datasets. Local neighbouring point distribution, key points distribution, colour information and their combinations are used for feature description. Finally we analyse their influences on the multi-modal 3D point clouds data registration.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"14 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129384284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Arrigoni, L. Magri, B. Rossi, P. Fragneto, Andrea Fusiello
This paper proposes a robust method to solve the absolute rotation estimation problem, which arises in global registration of 3D point sets and in structure-from-motion. A novel cost function is formulated which inherently copes with outliers. In particular, the proposed algorithm handles both outlier and missing relative rotations, by casting the problem as a "low-rank & sparse" matrix decomposition. As a side effect, this solution can be seen as a valid and cost-effective detector of inconsistent pair wise rotations. Computational efficiency and numerical accuracy, are demonstrated by simulated and real experiments.
{"title":"Robust Absolute Rotation Estimation via Low-Rank and Sparse Matrix Decomposition","authors":"F. Arrigoni, L. Magri, B. Rossi, P. Fragneto, Andrea Fusiello","doi":"10.1109/3DV.2014.48","DOIUrl":"https://doi.org/10.1109/3DV.2014.48","url":null,"abstract":"This paper proposes a robust method to solve the absolute rotation estimation problem, which arises in global registration of 3D point sets and in structure-from-motion. A novel cost function is formulated which inherently copes with outliers. In particular, the proposed algorithm handles both outlier and missing relative rotations, by casting the problem as a \"low-rank & sparse\" matrix decomposition. As a side effect, this solution can be seen as a valid and cost-effective detector of inconsistent pair wise rotations. Computational efficiency and numerical accuracy, are demonstrated by simulated and real experiments.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128833925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose an algebraic solution for rapid SVBRDF measurement. The algebraic approach requires only a few reflectance samples to obtain the parameters described by the physically based Cook - Torrance model. This solution, however, also involves constraints concerning light and the normal direction in the acquisition process. To meet these constraints, we developed a system that changes the illumination according to the target 3D shape at high speed. As a result, the proposed method provides BRDF parameters at each texel without optimization and over-sampling. We demonstrated rapid measurement with real objects that do not have uniform reflectance and confirmed the validity of this approach by comparison with conventional methods.
{"title":"Rapid SVBRDF Measurement by Algebraic Solution Based on Adaptive Illumination","authors":"Leo Miyashita, Yoshihiro Watanabe, M. Ishikawa","doi":"10.1109/3DV.2014.41","DOIUrl":"https://doi.org/10.1109/3DV.2014.41","url":null,"abstract":"In this paper, we propose an algebraic solution for rapid SVBRDF measurement. The algebraic approach requires only a few reflectance samples to obtain the parameters described by the physically based Cook - Torrance model. This solution, however, also involves constraints concerning light and the normal direction in the acquisition process. To meet these constraints, we developed a system that changes the illumination according to the target 3D shape at high speed. As a result, the proposed method provides BRDF parameters at each texel without optimization and over-sampling. We demonstrated rapid measurement with real objects that do not have uniform reflectance and confirmed the validity of this approach by comparison with conventional methods.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128885828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a framework for learning a three layered model of human shape, pose and garment deformation. The proposed deformation model provides intuitive control over the three parameters independently, while producing aesthetically pleasing deformations of both the garment and the human body. The shape and pose deformation layers of the model are trained on a rich dataset of full body 3D scans of human subjects in a variety of poses. The garment deformation layer is trained on animated mesh sequences of dressed actors and relies on a novel technique for human shape and posture estimation under clothing. The key contribution of this paper is that we consider garment deformations as the residual transformations between a naked mesh and the dressed mesh of the same subject.
{"title":"A Layered Model of Human Body and Garment Deformation","authors":"A. Neophytou, A. Hilton","doi":"10.1109/3DV.2014.52","DOIUrl":"https://doi.org/10.1109/3DV.2014.52","url":null,"abstract":"In this paper we present a framework for learning a three layered model of human shape, pose and garment deformation. The proposed deformation model provides intuitive control over the three parameters independently, while producing aesthetically pleasing deformations of both the garment and the human body. The shape and pose deformation layers of the model are trained on a rich dataset of full body 3D scans of human subjects in a variety of poses. The garment deformation layer is trained on animated mesh sequences of dressed actors and relies on a novel technique for human shape and posture estimation under clothing. The key contribution of this paper is that we consider garment deformations as the residual transformations between a naked mesh and the dressed mesh of the same subject.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115668032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amaël Delaunoy, Jia Li, Bastien Jacquet, M. Pollefeys
We propose a new approach to estimate the geometric extrinsic calibration of all the elements of a smart phone or tablet (such as the screen, the front and the back cameras) by using a planar mirror. By moving a smart phone in front of a single static planar mirror, it is possible to establish correspondences between the images and a pattern displayed on the screen, and therefore estimate the geometric relationship between the non-overlapping cameras with respect to the screen location. The newly proposed setup (static mirror, moving smart phone) enables to both improve the state-of-the-art by working in the minimal case of two images, and improve the accuracy when more images are available. We analyze the minimal case for different calibration scenarios and evaluate the proposed approach on several data. We also show an application of this geometric calibration for specular surface reconstruction, by observing the reflection of a known pattern displayed on the screen.
{"title":"Two Cameras and a Screen: How to Calibrate Mobile Devices?","authors":"Amaël Delaunoy, Jia Li, Bastien Jacquet, M. Pollefeys","doi":"10.1109/3DV.2014.102","DOIUrl":"https://doi.org/10.1109/3DV.2014.102","url":null,"abstract":"We propose a new approach to estimate the geometric extrinsic calibration of all the elements of a smart phone or tablet (such as the screen, the front and the back cameras) by using a planar mirror. By moving a smart phone in front of a single static planar mirror, it is possible to establish correspondences between the images and a pattern displayed on the screen, and therefore estimate the geometric relationship between the non-overlapping cameras with respect to the screen location. The newly proposed setup (static mirror, moving smart phone) enables to both improve the state-of-the-art by working in the minimal case of two images, and improve the accuracy when more images are available. We analyze the minimal case for different calibration scenarios and evaluate the proposed approach on several data. We also show an application of this geometric calibration for specular surface reconstruction, by observing the reflection of a known pattern displayed on the screen.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124239955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the emergence of relatively low-cost real-time 3D imaging sensors, new applications for suites of 3D sensors are becoming practical. For example, 3D sensors in an industrial robotic work cell can monitor workers' positions to ensure their safety. This paper introduces a simple-to-use method for extrinsic calibration of multiple 3D sensors observing a common workspace. Traditional planar target camera calibration techniques are not well-suited for such situations, because multiple cameras may not observe the same target. Our method uses a hand-held spherical target, which is imaged from various points within the workspace. The algorithm automatically detects the sphere in a sequence of views and simultaneously estimates the sphere centers and extrinsic parameters to align an arbitrary network of 3D sensors. We demonstrate the approach with examples of calibrating heterogeneous collections of 3D cameras and achieve better results than traditional, image-based calibration.
{"title":"Calibration of 3D Sensors Using a Spherical Target","authors":"Minghao Ruan, Daniel F. Huber","doi":"10.1109/3DV.2014.100","DOIUrl":"https://doi.org/10.1109/3DV.2014.100","url":null,"abstract":"With the emergence of relatively low-cost real-time 3D imaging sensors, new applications for suites of 3D sensors are becoming practical. For example, 3D sensors in an industrial robotic work cell can monitor workers' positions to ensure their safety. This paper introduces a simple-to-use method for extrinsic calibration of multiple 3D sensors observing a common workspace. Traditional planar target camera calibration techniques are not well-suited for such situations, because multiple cameras may not observe the same target. Our method uses a hand-held spherical target, which is imaged from various points within the workspace. The algorithm automatically detects the sphere in a sequence of views and simultaneously estimates the sphere centers and extrinsic parameters to align an arbitrary network of 3D sensors. We demonstrate the approach with examples of calibrating heterogeneous collections of 3D cameras and achieve better results than traditional, image-based calibration.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124298059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel solution to the world-base calibration problem. It is applicable in situations where a known calibration target is observed by a camera attached to the end effector of a robotic manipulator. The presented method works by minimizing geometrically meaningful error function based on image projections. Our formulation leads to a non-convex multivariate polynomial optimization problem of a constant size. However, we show how such a problem can be relaxed using linear matrix inequality (LMI) relaxations and effectively solved using Semi definite Programming. Although the technique of LMI relaxations guaranties only a lower bound on the global minimum of the original problem, it can provide a certificate of optimality in cases when the global minimum is reached. Indeed, we reached the global minimum for all calibration tasks in our experiments with both synthetic and real data. The experiments also show that the presented method is fast and noise resistant.
{"title":"World-Base Calibration by Global Polynomial Optimization","authors":"Jan Heller, T. Pajdla","doi":"10.1109/3DV.2014.78","DOIUrl":"https://doi.org/10.1109/3DV.2014.78","url":null,"abstract":"This paper presents a novel solution to the world-base calibration problem. It is applicable in situations where a known calibration target is observed by a camera attached to the end effector of a robotic manipulator. The presented method works by minimizing geometrically meaningful error function based on image projections. Our formulation leads to a non-convex multivariate polynomial optimization problem of a constant size. However, we show how such a problem can be relaxed using linear matrix inequality (LMI) relaxations and effectively solved using Semi definite Programming. Although the technique of LMI relaxations guaranties only a lower bound on the global minimum of the original problem, it can provide a certificate of optimality in cases when the global minimum is reached. Indeed, we reached the global minimum for all calibration tasks in our experiments with both synthetic and real data. The experiments also show that the presented method is fast and noise resistant.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122075821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}