David Ferstl, Christian Reinbacher, G. Riegler, M. Rüther, H. Bischof
In this paper we present a novel method to accurately estimate the dense 3D motion field, known as scene flow, from depth and intensity acquisitions. The method is formulated as a convex energy optimization, where the motion warping of each scene point is estimated through a projection and back-projection directly in 3D space. We utilize higher order regularization which is weighted and directed according to the input data by an anisotropic diffusion tensor. Our formulation enables the calculation of a dense flow field which does not penalize smooth and non-rigid movements while aligning motion boundaries with strong depth boundaries. An efficient parallelization of the numerical algorithm leads to runtimes in the order of 1s and therefore enables the method to be used in a variety of applications. We show that this novel scene flow calculation outperforms existing approaches in terms of speed and accuracy. Furthermore, we demonstrate applications such as camera pose estimation and depth image super resolution, which are enabled by the high accuracy of the proposed method. We show these applications using modern depth sensors such as Microsoft Kinect or the PMD Nano Time-of-Flight sensor.
{"title":"aTGV-SF: Dense Variational Scene Flow through Projective Warping and Higher Order Regularization","authors":"David Ferstl, Christian Reinbacher, G. Riegler, M. Rüther, H. Bischof","doi":"10.1109/3DV.2014.19","DOIUrl":"https://doi.org/10.1109/3DV.2014.19","url":null,"abstract":"In this paper we present a novel method to accurately estimate the dense 3D motion field, known as scene flow, from depth and intensity acquisitions. The method is formulated as a convex energy optimization, where the motion warping of each scene point is estimated through a projection and back-projection directly in 3D space. We utilize higher order regularization which is weighted and directed according to the input data by an anisotropic diffusion tensor. Our formulation enables the calculation of a dense flow field which does not penalize smooth and non-rigid movements while aligning motion boundaries with strong depth boundaries. An efficient parallelization of the numerical algorithm leads to runtimes in the order of 1s and therefore enables the method to be used in a variety of applications. We show that this novel scene flow calculation outperforms existing approaches in terms of speed and accuracy. Furthermore, we demonstrate applications such as camera pose estimation and depth image super resolution, which are enabled by the high accuracy of the proposed method. We show these applications using modern depth sensors such as Microsoft Kinect or the PMD Nano Time-of-Flight sensor.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132407183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Creating textured 3D scans of indoor environments has experienced a large boost with the advent of cheap commodity depth sensors. However, the quality of the acquired 3D models is often impaired by color seams in the reconstruction due to varying illumination (e.g., Shadows or highlights) and object surfaces whose brightness and color vary with the viewpoint of the camera. In this paper, we propose a direct and simple method to estimate the pure albedo of the texture, which allows us to remove illumination effects from IR and color images. Our approach first computes the illumination-independent albedo in the IR domain, which we subsequently transfer to the color albedo. As shadows and highlights lead to over- and underexposed image regions with little or no color information, we apply an advanced optimization scheme to infer color information in the color albedo from neigh boring image regions. We demonstrate the applicability of our approach to various real-world scenes.
{"title":"Towards Illumination-Invariant 3D Reconstruction Using ToF RGB-D Cameras","authors":"C. Kerl, Mohamed Souiai, Jürgen Sturm, D. Cremers","doi":"10.1109/3DV.2014.62","DOIUrl":"https://doi.org/10.1109/3DV.2014.62","url":null,"abstract":"Creating textured 3D scans of indoor environments has experienced a large boost with the advent of cheap commodity depth sensors. However, the quality of the acquired 3D models is often impaired by color seams in the reconstruction due to varying illumination (e.g., Shadows or highlights) and object surfaces whose brightness and color vary with the viewpoint of the camera. In this paper, we propose a direct and simple method to estimate the pure albedo of the texture, which allows us to remove illumination effects from IR and color images. Our approach first computes the illumination-independent albedo in the IR domain, which we subsequently transfer to the color albedo. As shadows and highlights lead to over- and underexposed image regions with little or no color information, we apply an advanced optimization scheme to infer color information in the color albedo from neigh boring image regions. We demonstrate the applicability of our approach to various real-world scenes.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127847850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For large collections of 3D models, the ability to detect and localize parts of interest is necessary to provide search and visualization enhancements beyond simple high-level categorization. While current 3D labeling approaches rely on learning from fully labeled meshes, such training data is difficult to acquire at scale. In this work we explore learning to detect object parts from sparsely labeled data, i.e. we operate under the assumption that for any object part we have only one labeled vertex rather than a full region segmentation. Similarly, we also learn to output a single representative vertex for each detected part. Such localized predictions are useful for applications where visualization is important. Our approach relies heavily on exploiting the spatial configuration of parts on a model to drive the detection. Inspired by structured multi-class object detection models for images, we develop an algorithm that combines independently trained part classifiers with a structured SVM model, and show promising results on real-world textured 3D data.
{"title":"Learning 3D Part Detection from Sparsely Labeled Data","authors":"A. Makadia, M. E. Yümer","doi":"10.1109/3DV.2014.108","DOIUrl":"https://doi.org/10.1109/3DV.2014.108","url":null,"abstract":"For large collections of 3D models, the ability to detect and localize parts of interest is necessary to provide search and visualization enhancements beyond simple high-level categorization. While current 3D labeling approaches rely on learning from fully labeled meshes, such training data is difficult to acquire at scale. In this work we explore learning to detect object parts from sparsely labeled data, i.e. we operate under the assumption that for any object part we have only one labeled vertex rather than a full region segmentation. Similarly, we also learn to output a single representative vertex for each detected part. Such localized predictions are useful for applications where visualization is important. Our approach relies heavily on exploiting the spatial configuration of parts on a model to drive the detection. Inspired by structured multi-class object detection models for images, we develop an algorithm that combines independently trained part classifiers with a structured SVM model, and show promising results on real-world textured 3D data.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130034767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For the design of mass-produced wearable objects for a population it is important to find a small number of sizes, called a sizing system, that will fit well on a wide range of individuals in the population. To obtain a sizing system that incorporates the shape of an identity along with its motion, we introduce a general framework to generate a sizing system for dynamic 3D motion data. Based on a registered 3D motion database a sizing system is computed for task-specific anthropometric measurements and tolerances, specified by designers. We generate the sizing system by transforming the problem into a box stabbing problem, which aims to find the lowest number of points stabbing a set of boxes. We use a standard computational geometry technique to solve this, it recursively computes the stabbing of lower-dimensional boxes. We apply our framework to a database of facial motion data for anthropometric measurements related to the design of face masks. We show the generalization capabilities of this sizing system on unseen data, and compute, for each size, a representative 3D shape that can be used by designers to produce a prototype model.
{"title":"A General Framework to Generate Sizing Systems from 3D Motion Data Applied to Face Mask Design","authors":"Timo Bolkart, P. Bose, Chang Shu, S. Wuhrer","doi":"10.1109/3DV.2014.43","DOIUrl":"https://doi.org/10.1109/3DV.2014.43","url":null,"abstract":"For the design of mass-produced wearable objects for a population it is important to find a small number of sizes, called a sizing system, that will fit well on a wide range of individuals in the population. To obtain a sizing system that incorporates the shape of an identity along with its motion, we introduce a general framework to generate a sizing system for dynamic 3D motion data. Based on a registered 3D motion database a sizing system is computed for task-specific anthropometric measurements and tolerances, specified by designers. We generate the sizing system by transforming the problem into a box stabbing problem, which aims to find the lowest number of points stabbing a set of boxes. We use a standard computational geometry technique to solve this, it recursively computes the stabbing of lower-dimensional boxes. We apply our framework to a database of facial motion data for anthropometric measurements related to the design of face masks. We show the generalization capabilities of this sizing system on unseen data, and compute, for each size, a representative 3D shape that can be used by designers to produce a prototype model.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116204002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper has two contributions in the context of line based camera pose estimation, 1) We propose a purely geometric approach to establish correspondence between 3D line segments in a given model and 2D line segments detected in an image, 2) We eliminate a degenerate case due to the type of rotation representation in arguably the best line based pose estimation method currently available. For establishing line correspondences we perform exhaustive search on the space of camera pose values till we obtain a pose (position and rotation) which is geometrically consistent with the given set of 2D, 3D lines. For this highly complex search we design a strategy which performs precomputations on the 3D model using separate set of constraints on position and rotation values. During runtime, the set of different rotation values are ranked independently and combined with each position values in the order of their ranking. Then successive geometric constraints which are much simpler when compared to computing reprojection error are used to eliminate incorrect pose values. We show that the ranking of rotation values reduces the number of trials needed by a huge factor and the simple geometric constraints avoid the need for computing the reprojection error in most cases. Though the execution time for the current MATLAB implementation is far from real time requirement, our method can be accelerated significantly by exploiting simplicity and parallelizability of the operations we employ. For eliminating the degenerate case in the state of art pose estimation method, we reformulate the rotation representation. We use unit quaternions instead of CGR parameters used by the method.
{"title":"Line Matching and Pose Estimation for Unconstrained Model-to-Image Alignment","authors":"K. Bhat, J. Heikkilä","doi":"10.1109/3DV.2014.27","DOIUrl":"https://doi.org/10.1109/3DV.2014.27","url":null,"abstract":"This paper has two contributions in the context of line based camera pose estimation, 1) We propose a purely geometric approach to establish correspondence between 3D line segments in a given model and 2D line segments detected in an image, 2) We eliminate a degenerate case due to the type of rotation representation in arguably the best line based pose estimation method currently available. For establishing line correspondences we perform exhaustive search on the space of camera pose values till we obtain a pose (position and rotation) which is geometrically consistent with the given set of 2D, 3D lines. For this highly complex search we design a strategy which performs precomputations on the 3D model using separate set of constraints on position and rotation values. During runtime, the set of different rotation values are ranked independently and combined with each position values in the order of their ranking. Then successive geometric constraints which are much simpler when compared to computing reprojection error are used to eliminate incorrect pose values. We show that the ranking of rotation values reduces the number of trials needed by a huge factor and the simple geometric constraints avoid the need for computing the reprojection error in most cases. Though the execution time for the current MATLAB implementation is far from real time requirement, our method can be accelerated significantly by exploiting simplicity and parallelizability of the operations we employ. For eliminating the degenerate case in the state of art pose estimation method, we reformulate the rotation representation. We use unit quaternions instead of CGR parameters used by the method.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123913254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Herrera C., Kihwan Kim, Juho Kannala, K. Pulli, J. Heikkila
Obtaining a good baseline between different video frames is one of the key elements in vision-based monocular SLAM systems. However, if the video frames contain only a few 2D feature correspondences with a good baseline, or the camera only rotates without sufficient translation in the beginning, tracking and mapping becomes unstable. We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until key frames with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step. Adaptive criteria for key frame selection are also introduced for efficient optimization and dealing with multiple maps. We demonstrate that our SLAM system improves camera pose estimates and robustness, even with purely rotational motions.
{"title":"DT-SLAM: Deferred Triangulation for Robust SLAM","authors":"D. Herrera C., Kihwan Kim, Juho Kannala, K. Pulli, J. Heikkila","doi":"10.1109/3DV.2014.49","DOIUrl":"https://doi.org/10.1109/3DV.2014.49","url":null,"abstract":"Obtaining a good baseline between different video frames is one of the key elements in vision-based monocular SLAM systems. However, if the video frames contain only a few 2D feature correspondences with a good baseline, or the camera only rotates without sufficient translation in the beginning, tracking and mapping becomes unstable. We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until key frames with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step. Adaptive criteria for key frame selection are also introduced for efficient optimization and dealing with multiple maps. We demonstrate that our SLAM system improves camera pose estimates and robustness, even with purely rotational motions.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132276905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Some objects have layered structures in which a dynamic region is covered by a static layer. In this paper, we propose a new experiment for estimating the depth of the dynamic region using speckle analysis. The speckle is caused by the mutual interference of a coherence laser. We use two characteristics of the speckle. One is the temporal stability of the speckle pattern and the other is the wavelength dependency of the transmittance of the laser. We estimate the depth by computing correlations of speckle patterns using multispectral lasers. Experimental results using a simulated skin show that multispectral speckle correlation can be used for analyzing a layered structure.
{"title":"Estimating Depth of Layered Structure Based on Multispectral Speckle Correlation","authors":"T. Matsumura, Y. Mukaigawa, Y. Yagi","doi":"10.1109/3DV.2014.90","DOIUrl":"https://doi.org/10.1109/3DV.2014.90","url":null,"abstract":"Some objects have layered structures in which a dynamic region is covered by a static layer. In this paper, we propose a new experiment for estimating the depth of the dynamic region using speckle analysis. The speckle is caused by the mutual interference of a coherence laser. We use two characteristics of the speckle. One is the temporal stability of the speckle pattern and the other is the wavelength dependency of the transmittance of the laser. We estimate the depth by computing correlations of speckle patterns using multispectral lasers. Experimental results using a simulated skin show that multispectral speckle correlation can be used for analyzing a layered structure.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126534295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we address the problem of segmenting a 3D point cloud obtained from several RGB-D cameras into a set of 3D piecewise planar regions. This is a fundamental problem in computer vision, whose solution is helpful for further scene analysis, such as support inference and object localisation. In existing planar segmentation approaches for point clouds, the point cloud originates from a single RGB-D view. There is however a growing interest to monitor environments with computer vision setups that contain a set of calibrated 3D cameras located around the scene. To fully exploit the multi-view aspect of such setups, we propose in this paper a novel approach to perform the planar piecewise segmentation directly in 3D. This approach, called Voxel-MRF (V-MRF), is based on discrete 3D Markov random fields, whose nodes correspond to scene voxels and whose labels represent 3D planes. The voxelization of the scene permits to cope with noisy depth measurements, while the MRF formulation provides a natural handling of the 3D spatial constraints during the optimisation. The approach results in a decomposition of the scene into a set of 3D planar patches. A by-product of the method is also a joint planar segmentation of the original images into planar regions with consistent labels across the views. We demonstrate the advantages of our approach using a benchmark dataset of objects with known geometry. We also present qualitative results on challenging data acquired by a multi-camera system installed in two operating rooms.
{"title":"Piecewise Planar Decomposition of 3D Point Clouds Obtained from Multiple Static RGB-D Cameras","authors":"F. Barrera, N. Padoy","doi":"10.1109/3DV.2014.57","DOIUrl":"https://doi.org/10.1109/3DV.2014.57","url":null,"abstract":"In this paper, we address the problem of segmenting a 3D point cloud obtained from several RGB-D cameras into a set of 3D piecewise planar regions. This is a fundamental problem in computer vision, whose solution is helpful for further scene analysis, such as support inference and object localisation. In existing planar segmentation approaches for point clouds, the point cloud originates from a single RGB-D view. There is however a growing interest to monitor environments with computer vision setups that contain a set of calibrated 3D cameras located around the scene. To fully exploit the multi-view aspect of such setups, we propose in this paper a novel approach to perform the planar piecewise segmentation directly in 3D. This approach, called Voxel-MRF (V-MRF), is based on discrete 3D Markov random fields, whose nodes correspond to scene voxels and whose labels represent 3D planes. The voxelization of the scene permits to cope with noisy depth measurements, while the MRF formulation provides a natural handling of the 3D spatial constraints during the optimisation. The approach results in a decomposition of the scene into a set of 3D planar patches. A by-product of the method is also a joint planar segmentation of the original images into planar regions with consistent labels across the views. We demonstrate the advantages of our approach using a benchmark dataset of objects with known geometry. We also present qualitative results on challenging data acquired by a multi-camera system installed in two operating rooms.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115131764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yipin Yang, Yao Yu, Yu Zhou, S. Du, James Davis, Ruigang Yang
We develop a novel approach to generate human body models in a variety of shapes and poses via tuning semantic parameters. Our approach is investigated with datasets of up to 3000 scanned body models which have been placed in point to point correspondence. Correspondence is established by nonrigid deformation of a template mesh. The large dataset allows a local model to be learned robustly, in which individual parts of the human body can be accurately reshaped according to semantic parameters. We evaluate performance on two datasets and find that our model outperforms existing methods.
{"title":"Semantic Parametric Reshaping of Human Body Models","authors":"Yipin Yang, Yao Yu, Yu Zhou, S. Du, James Davis, Ruigang Yang","doi":"10.1109/3DV.2014.47","DOIUrl":"https://doi.org/10.1109/3DV.2014.47","url":null,"abstract":"We develop a novel approach to generate human body models in a variety of shapes and poses via tuning semantic parameters. Our approach is investigated with datasets of up to 3000 scanned body models which have been placed in point to point correspondence. Correspondence is established by nonrigid deformation of a template mesh. The large dataset allows a local model to be learned robustly, in which individual parts of the human body can be accurately reshaped according to semantic parameters. We evaluate performance on two datasets and find that our model outperforms existing methods.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127827996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Shan, Changchang Wu, B. Curless, Yasutaka Furukawa, Carlos Hernández, S. Seitz
We address the problem of geo-registering ground-based multi-view stereo models by ground-to-aerial image matching. The main contribution is a fully automated geo-registration pipeline with a novel viewpoint-dependent matching method that handles ground to aerial viewpoint variation. We conduct large-scale experiments which consist of many popular outdoor landmarks in Rome. The proposed approach demonstrates a high success rate for the task, and dramatically outperforms state-of-the-art techniques, yielding geo-registration at pixel-level accuracy.
{"title":"Accurate Geo-Registration by Ground-to-Aerial Image Matching","authors":"Qi Shan, Changchang Wu, B. Curless, Yasutaka Furukawa, Carlos Hernández, S. Seitz","doi":"10.1109/3DV.2014.69","DOIUrl":"https://doi.org/10.1109/3DV.2014.69","url":null,"abstract":"We address the problem of geo-registering ground-based multi-view stereo models by ground-to-aerial image matching. The main contribution is a fully automated geo-registration pipeline with a novel viewpoint-dependent matching method that handles ground to aerial viewpoint variation. We conduct large-scale experiments which consist of many popular outdoor landmarks in Rome. The proposed approach demonstrates a high success rate for the task, and dramatically outperforms state-of-the-art techniques, yielding geo-registration at pixel-level accuracy.","PeriodicalId":275516,"journal":{"name":"2014 2nd International Conference on 3D Vision","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}