RGB-D (color + 3D point cloud) based scene labeling has received much attention due to the affordable RGB-D sensors such as Microsoft Kinect. To fully utilize the RGB-D data, it is critical to develop robust features that can reliably describe the 3D shape information of the point cloud data. Previous work has proposed to extract SIFT-like features from the depth dimension data directly while ignored the important height dimension data of the 3D point cloud. In this paper, we propose to describe 3D scene using height gradient information and propose a new compact point cloud feature called Height Gradient Histogram (HIGH). Using Text on Boost as the pixel classifier, the experiments on two benchmarked 3D scene labeling datasets show that HIGH feature can well handle the intra-category variations of object class, and significantly improve class-average accuracy compared with the state-of-the-art results. We will publish the code of HIGH feature for the community.
基于RGB-D(彩色+ 3D点云)的场景标记由于价格合理的RGB-D传感器(如微软Kinect)而受到广泛关注。为了充分利用RGB-D数据,开发能够可靠地描述点云数据三维形状信息的鲁棒特征是至关重要的。以往的工作提出直接从深度维数据中提取类似sift的特征,而忽略了三维点云的重要高度维数据。在本文中,我们提出了使用高度梯度信息来描述三维场景,并提出了一种新的紧凑的点云特征,称为高度梯度直方图(HIGH)。使用Text on Boost作为像素分类器,在两个基准的3D场景标注数据集上进行了实验,结果表明,HIGH特征可以很好地处理对象类别的类别内变化,与目前的结果相比,类平均准确率显著提高。我们将向社区发布HIGH特性的代码。
{"title":"Height Gradient Histogram (HIGH) for 3D Scene Labeling","authors":"Gangqiang Zhao, Junsong Yuan, K. Dang","doi":"10.1109/3DV.2014.16","DOIUrl":"https://doi.org/10.1109/3DV.2014.16","url":null,"abstract":"RGB-D (color + 3D point cloud) based scene labeling has received much attention due to the affordable RGB-D sensors such as Microsoft Kinect. To fully utilize the RGB-D data, it is critical to develop robust features that can reliably describe the 3D shape information of the point cloud data. Previous work has proposed to extract SIFT-like features from the depth dimension data directly while ignored the important height dimension data of the 3D point cloud. In this paper, we propose to describe 3D scene using height gradient information and propose a new compact point cloud feature called Height Gradient Histogram (HIGH). Using Text on Boost as the pixel classifier, the experiments on two benchmarked 3D scene labeling datasets show that HIGH feature can well handle the intra-category variations of object class, and significantly improve class-average accuracy compared with the state-of-the-art results. We will publish the code of HIGH feature for the community.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123984207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Srinath Sridhar, Helge Rhodin, H. Seidel, Antti Oulasvirta, C. Theobalt
Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this paper, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representation based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differentiable making fast gradient based pose optimization possible. This shape representation, together with a full perspective projection model, enables more accurate hand modeling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualitatively and quantitatively on publicly available datasets.
{"title":"Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model","authors":"Srinath Sridhar, Helge Rhodin, H. Seidel, Antti Oulasvirta, C. Theobalt","doi":"10.1109/3DV.2014.37","DOIUrl":"https://doi.org/10.1109/3DV.2014.37","url":null,"abstract":"Real-time marker-less hand tracking is of increasing importance in human-computer interaction. Robust and accurate tracking of arbitrary hand motion is a challenging problem due to the many degrees of freedom, frequent self-occlusions, fast motions, and uniform skin color. In this paper, we propose a new approach that tracks the full skeleton motion of the hand from multiple RGB cameras in real-time. The main contributions include a new generative tracking method which employs an implicit hand shape representation based on Sum of Anisotropic Gaussians (SAG), and a pose fitting energy that is smooth and analytically differentiable making fast gradient based pose optimization possible. This shape representation, together with a full perspective projection model, enables more accurate hand modeling than a related baseline method from literature. Our method achieves better accuracy than previous methods and runs at 25 fps. We show these improvements both qualitatively and quantitatively on publicly available datasets.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121586354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The 4-Points Congruent Sets (4PCS) algorithm is a state-of-the-art RANSAC-based algorithm for registering two partially overlapping 3D point sets using raw points. Unlike other RANSAC-based algorithms, which try to achieve registration by searching for matching 3-point bases, it uses a base of two coplanar pairs of points to reduce the search space matching bases. In this work, we first generalize the algorithm by allowing the two pairs to fall on two different planes which have an arbitrary distance, i.e. Degree of separation, between them. Furthermore, we show that increasing the degree of separation exponentially decreases the search space of matching bases. Using this property, we show that using the new generalized base allows for more efficient registration than the original 4PCS base type. We achieve a maximum run-time improvement of 83.10% for 3D registration.
{"title":"Generalized 4-Points Congruent Sets for 3D Registration","authors":"Mustafa Mohamad, D. Rappaport, M. Greenspan","doi":"10.1109/3DV.2014.21","DOIUrl":"https://doi.org/10.1109/3DV.2014.21","url":null,"abstract":"The 4-Points Congruent Sets (4PCS) algorithm is a state-of-the-art RANSAC-based algorithm for registering two partially overlapping 3D point sets using raw points. Unlike other RANSAC-based algorithms, which try to achieve registration by searching for matching 3-point bases, it uses a base of two coplanar pairs of points to reduce the search space matching bases. In this work, we first generalize the algorithm by allowing the two pairs to fall on two different planes which have an arbitrary distance, i.e. Degree of separation, between them. Furthermore, we show that increasing the degree of separation exponentially decreases the search space of matching bases. Using this property, we show that using the new generalized base allows for more efficient registration than the original 4PCS base type. We achieve a maximum run-time improvement of 83.10% for 3D registration.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"306 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123077325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaokun Wu, Chuan Li, Michael Wand, K. Hildebrandt, Silke Jansen, H. Seidel
Texture synthesis is a versatile tool for creating and editing 2D images. However, applying it to 3D content creation is difficult due to the higher demand of model accuracy and the large search space that also contains many implausible shapes. Our paper explores offset statistics for 3D shape retargeting. We observe that the offset histograms between similar 3D features are sparse, in particular for man-made objects such as buildings and furniture. We employ sparse offset statistics to improve 3D shape retargeting (i.e., Rescaling in different directions). We employ a graph-cut texture synthesis method that iteratively stitches model fragments shifted by the detected sparse offsets. The offsets reveal important structural redundancy which leads to more plausible results and more efficient optimization. Our method is fully automatic, while intuitive user control can be incorporated for interactive modeling in real-time. We empirically evaluate the sparsity of offset statistics across a wide range of subjects, and show our statistics based retargeting significantly improves quality and efficiency over conventional MRF models.
{"title":"3D Model Retargeting Using Offset Statistics","authors":"Xiaokun Wu, Chuan Li, Michael Wand, K. Hildebrandt, Silke Jansen, H. Seidel","doi":"10.1109/3DV.2014.74","DOIUrl":"https://doi.org/10.1109/3DV.2014.74","url":null,"abstract":"Texture synthesis is a versatile tool for creating and editing 2D images. However, applying it to 3D content creation is difficult due to the higher demand of model accuracy and the large search space that also contains many implausible shapes. Our paper explores offset statistics for 3D shape retargeting. We observe that the offset histograms between similar 3D features are sparse, in particular for man-made objects such as buildings and furniture. We employ sparse offset statistics to improve 3D shape retargeting (i.e., Rescaling in different directions). We employ a graph-cut texture synthesis method that iteratively stitches model fragments shifted by the detected sparse offsets. The offsets reveal important structural redundancy which leads to more plausible results and more efficient optimization. Our method is fully automatic, while intuitive user control can be incorporated for interactive modeling in real-time. We empirically evaluate the sparsity of offset statistics across a wide range of subjects, and show our statistics based retargeting significantly improves quality and efficiency over conventional MRF models.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123693253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Cansizoglu, Yuichi Taguchi, S. Ramalingam, Yohei Miki
We present a simple method for calibrating a set of cameras that may not have overlapping field of views. We reduce the problem of calibrating the non-overlapping cameras to the problem of localizing the cameras with respect to a global 3D model reconstructed with a simultaneous localization and mapping (SLAM) system. Specifically, we first reconstruct such a global 3D model by using a SLAM system using an RGB-D sensor. We then perform localization and intrinsic parameter estimation for each camera by using 2D-3D correspondences between the camera and the 3D model. Our method locates the cameras within the 3D model, which is useful for visually inspecting camera poses and provides a model-guided browsing interface of the images. We demonstrate the advantages of our method using several indoor scenes.
{"title":"Calibration of Non-overlapping Cameras Using an External SLAM System","authors":"E. Cansizoglu, Yuichi Taguchi, S. Ramalingam, Yohei Miki","doi":"10.1109/3DV.2014.106","DOIUrl":"https://doi.org/10.1109/3DV.2014.106","url":null,"abstract":"We present a simple method for calibrating a set of cameras that may not have overlapping field of views. We reduce the problem of calibrating the non-overlapping cameras to the problem of localizing the cameras with respect to a global 3D model reconstructed with a simultaneous localization and mapping (SLAM) system. Specifically, we first reconstruct such a global 3D model by using a SLAM system using an RGB-D sensor. We then perform localization and intrinsic parameter estimation for each camera by using 2D-3D correspondences between the camera and the 3D model. Our method locates the cameras within the 3D model, which is useful for visually inspecting camera poses and provides a model-guided browsing interface of the images. We demonstrate the advantages of our method using several indoor scenes.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124850271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Basaru, Chris Child, Eduardo Alonso, G. Slabaugh
Current depth capturing devices show serious drawbacks in certain applications, for example ego-centric depth recovery: they are cumbersome, have a high power requirement, and do not portray high resolution at near distance. Stereo-matching techniques are a suitable alternative, but whilst the idea behind these techniques is simple it is well known that recovery of an accurate disparity map by stereo-matching requires overcoming three main problems: occluded regions causing absence of corresponding pixels, existence of noise in the image capturing sensor and inconsistent color and brightness in the captured images. We propose a modified version of the Census-Hamming cost function which allows more robust matching with an emphasis on improving performance under radiometric variations of the input images.
{"title":"Quantized Census for Stereoscopic Image Matching","authors":"R. Basaru, Chris Child, Eduardo Alonso, G. Slabaugh","doi":"10.1109/3DV.2014.83","DOIUrl":"https://doi.org/10.1109/3DV.2014.83","url":null,"abstract":"Current depth capturing devices show serious drawbacks in certain applications, for example ego-centric depth recovery: they are cumbersome, have a high power requirement, and do not portray high resolution at near distance. Stereo-matching techniques are a suitable alternative, but whilst the idea behind these techniques is simple it is well known that recovery of an accurate disparity map by stereo-matching requires overcoming three main problems: occluded regions causing absence of corresponding pixels, existence of noise in the image capturing sensor and inconsistent color and brightness in the captured images. We propose a modified version of the Census-Hamming cost function which allows more robust matching with an emphasis on improving performance under radiometric variations of the input images.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128353800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boxin Shi, K. Inose, Y. Matsushita, P. Tan, Sai-Kit Yeung, K. Ikeuchi
Photometric stereo using unorganized Internet images is very challenging, because the input images are captured under unknown general illuminations, with uncontrolled cameras. We propose to solve this difficult problem by a simple yet effective approach that makes use of a coarse shape prior. The shape prior is obtained from multi-view stereo and will be useful in twofold: resolving the shape-light ambiguity in uncalibrated photometric stereo and guiding the estimated normals to produce the high quality 3D surface. By assuming the surface albedo is not highly contrasted, we also propose a novel linear approximation of the nonlinear camera responses with our normal estimation algorithm. We evaluate our method using synthetic data and demonstrate the surface improvement on real data over multi-view stereo results.
{"title":"Photometric Stereo Using Internet Images","authors":"Boxin Shi, K. Inose, Y. Matsushita, P. Tan, Sai-Kit Yeung, K. Ikeuchi","doi":"10.1109/3DV.2014.9","DOIUrl":"https://doi.org/10.1109/3DV.2014.9","url":null,"abstract":"Photometric stereo using unorganized Internet images is very challenging, because the input images are captured under unknown general illuminations, with uncontrolled cameras. We propose to solve this difficult problem by a simple yet effective approach that makes use of a coarse shape prior. The shape prior is obtained from multi-view stereo and will be useful in twofold: resolving the shape-light ambiguity in uncalibrated photometric stereo and guiding the estimated normals to produce the high quality 3D surface. By assuming the surface albedo is not highly contrasted, we also propose a novel linear approximation of the nonlinear camera responses with our normal estimation algorithm. We evaluate our method using synthetic data and demonstrate the surface improvement on real data over multi-view stereo results.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117230042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dorra Larnaout, V. Gay-Bellile, S. Bourgeois, M. Dhome
We improve in this paper the localization accuracy of visual SLAM (VSLAM) / GPS fusion in dense urban area by using 3D building models provided by Geographic Information System (GIS). GPS inaccuracies are corrected by comparison of the reconstruction resulting from the VSLAM / GPS fusion with 3D building models. These corrected GPS data are thereafter re-injected in the fusion process. Experimental results demonstrate the accuracy improvements achieved through our proposed solution.
{"title":"Vision-Based Differential GPS: Improving VSLAM / GPS Fusion in Urban Environment with 3D Building Models","authors":"Dorra Larnaout, V. Gay-Bellile, S. Bourgeois, M. Dhome","doi":"10.1109/3DV.2014.73","DOIUrl":"https://doi.org/10.1109/3DV.2014.73","url":null,"abstract":"We improve in this paper the localization accuracy of visual SLAM (VSLAM) / GPS fusion in dense urban area by using 3D building models provided by Geographic Information System (GIS). GPS inaccuracies are corrected by comparison of the reconstruction resulting from the VSLAM / GPS fusion with 3D building models. These corrected GPS data are thereafter re-injected in the fusion process. Experimental results demonstrate the accuracy improvements achieved through our proposed solution.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121367774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimation of curvature in volumetric datasets is considered. One component of the exhibition here is new extensions of several known methods for such estimations in range images to the new domain of volumetric datasets. A second component is that the (1) accuracy and (2) computational performance of these extensions (and five well-known existing methods for curvature estimation in volumetric datasets) are comparatively examined.
{"title":"On Reliable Estimation of Curvatures of Implicit Surfaces","authors":"Jacob D. Hauenstein, Timothy S Newman","doi":"10.1109/3DV.2014.30","DOIUrl":"https://doi.org/10.1109/3DV.2014.30","url":null,"abstract":"Estimation of curvature in volumetric datasets is considered. One component of the exhibition here is new extensions of several known methods for such estimations in range images to the new domain of volumetric datasets. A second component is that the (1) accuracy and (2) computational performance of these extensions (and five well-known existing methods for curvature estimation in volumetric datasets) are comparatively examined.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126381844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a novel covariance-based framework for the robust characterization and classification of human gestures in 3D depth sequences. The proposed 4DCov descriptor uses the notion of covariance to create compact representations of complex interactions between variations of 3D features in the spatial and temporal domain, instead of using the absolute features themselves. Despite the compactness of this representation, it still offers discriminative power for human-gesture classification. The codification of feature variations along a scene makes our descriptor robust to inter-subject and intra-class variations, periodic motions and different speeds during gesture executions, compared to other key point or histogram-based descriptor approaches. Furthermore, a sparse collaborative classification method is also presented, taking advantage of our descriptor laying on a specific manifold topology and observing that similar motions are geometrically clustered in the descriptor space. Classification accuracy results are presented against state-of-the-art approaches on top of four public human gesture datasets acquired with 3D depth sensor devices, including complex gestures from different natures.
{"title":"4DCov: A Nested Covariance Descriptor of Spatio-Temporal Features for Gesture Recognition in Depth Sequences","authors":"Pol Cirujeda, Xavier Binefa","doi":"10.1109/3DV.2014.10","DOIUrl":"https://doi.org/10.1109/3DV.2014.10","url":null,"abstract":"In this paper we propose a novel covariance-based framework for the robust characterization and classification of human gestures in 3D depth sequences. The proposed 4DCov descriptor uses the notion of covariance to create compact representations of complex interactions between variations of 3D features in the spatial and temporal domain, instead of using the absolute features themselves. Despite the compactness of this representation, it still offers discriminative power for human-gesture classification. The codification of feature variations along a scene makes our descriptor robust to inter-subject and intra-class variations, periodic motions and different speeds during gesture executions, compared to other key point or histogram-based descriptor approaches. Furthermore, a sparse collaborative classification method is also presented, taking advantage of our descriptor laying on a specific manifold topology and observing that similar motions are geometrically clustered in the descriptor space. Classification accuracy results are presented against state-of-the-art approaches on top of four public human gesture datasets acquired with 3D depth sensor devices, including complex gestures from different natures.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131682014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}