Recently, significant progress has been achieved in analyzing the 3D point cloud with deep learning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deep learning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.
{"title":"RotPredictor: Unsupervised Canonical Viewpoint Learning for Point Cloud Classification","authors":"Jin Fang, Dingfu Zhou, Xibin Song, Sheng Jin, Ruigang Yang, Liangjun Zhang","doi":"10.1109/3DV50981.2020.00109","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00109","url":null,"abstract":"Recently, significant progress has been achieved in analyzing the 3D point cloud with deep learning techniques. However, existing networks suffer from poor generalization and robustness to arbitrary rotations applied to the input point cloud. Different from traditional strategies that improve the rotation robustness with data augmentation or specifically designed spherical representation or harmonics-based kernels, we propose to rotate the point cloud into a canonical viewpoint for boosting the following downstream target task, e.g., object classification and part segmentation. Specifically, the canonical viewpoint is predicted by the network RotPredictor in an unsupervised way and the loss function is only built on the target task. Our RotPredictor satisfies the rotation equivariance property in (3) approximately and the predication output has the linear relationship with the applied rotation transformation. In addition, the RotPredictor is an independent plug and play module, which can be employed by any point-based deep learning framework without extra burden. Experimental results on the public model classification dataset ModelNet40 show the performance for all baselines can be boosted by integrating the proposed module. In addition, by adding our proposed module, we can achieve the state-of-the-art classification accuracy with 90.2% on the rotation-augmented ModelNet40 benchmark.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122999187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00027
Bruno Petit, Richard Guillemard, V. Gay-Bellile
Tightly coupled Visual-Inertial SLAM (VISLAM) algorithms are now state of the art approaches for indoor localization. There are many implementations of VISLAM, like filter-based and non-linear optimization based algorithms. They all require an accurate temporal alignment between sensors clock and an initial IMU state gyroscope and accelerometer biases value, gravity direction and initial velocity) for precise localization. In this paper we propose an initialization procedure of VISLAM that estimates simultaneously IMU-camera temporal calibration and the initial IMU state. To this end, the concept of Time Shifted IMU Preintegration} (TSIP) measurements is introduced. an interpolation of IMU preintegration that takes into account the effect of sensors clock misalignment. These TSIP measurements are included along with visual odometry measurements in a graph that is incrementally optimized. It results in a real time, accurate and robust initialization for VISLAM as demonstrated in the experiments on real data.
{"title":"Time Shifted IMU Preintegration for Temporal Calibration in Incremental Visual-Inertial Initialization","authors":"Bruno Petit, Richard Guillemard, V. Gay-Bellile","doi":"10.1109/3DV50981.2020.00027","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00027","url":null,"abstract":"Tightly coupled Visual-Inertial SLAM (VISLAM) algorithms are now state of the art approaches for indoor localization. There are many implementations of VISLAM, like filter-based and non-linear optimization based algorithms. They all require an accurate temporal alignment between sensors clock and an initial IMU state gyroscope and accelerometer biases value, gravity direction and initial velocity) for precise localization. In this paper we propose an initialization procedure of VISLAM that estimates simultaneously IMU-camera temporal calibration and the initial IMU state. To this end, the concept of Time Shifted IMU Preintegration} (TSIP) measurements is introduced. an interpolation of IMU preintegration that takes into account the effect of sensors clock misalignment. These TSIP measurements are included along with visual odometry measurements in a graph that is incrementally optimized. It results in a real time, accurate and robust initialization for VISLAM as demonstrated in the experiments on real data.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115357023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00056
Xuan Luo, Yanmeng Kong, Jason Lawrence, Ricardo Martin-Brualla, S. Seitz
This paper introduces KeystoneDepth, the largest and most diverse collection of rectified historical stereo image pairs to date, consisting of tens of thousands of stereographs of people, events, objects, and scenes recorded between 1864 and 1966. Leveraging the Keystone-Mast Collection of stereographs from the California Museum of Photography, we apply multiple processing steps to produce clean stereo image pairs, complete with calibration data, rectification transforms, and disparity maps. We introduce a novel stereo rectification technique based on the unique properties of antique stereo cameras. To better visualize the results on 2D displays, we also introduce a self-supervised deep view synthesis technique trained on historical imagery. Our dataset is available at http://keystonedepth.cs.washington.edu/.
{"title":"KeystoneDepth: History in 3D","authors":"Xuan Luo, Yanmeng Kong, Jason Lawrence, Ricardo Martin-Brualla, S. Seitz","doi":"10.1109/3DV50981.2020.00056","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00056","url":null,"abstract":"This paper introduces KeystoneDepth, the largest and most diverse collection of rectified historical stereo image pairs to date, consisting of tens of thousands of stereographs of people, events, objects, and scenes recorded between 1864 and 1966. Leveraging the Keystone-Mast Collection of stereographs from the California Museum of Photography, we apply multiple processing steps to produce clean stereo image pairs, complete with calibration data, rectification transforms, and disparity maps. We introduce a novel stereo rectification technique based on the unique properties of antique stereo cameras. To better visualize the results on 2D displays, we also introduce a self-supervised deep view synthesis technique trained on historical imagery. Our dataset is available at http://keystonedepth.cs.washington.edu/.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00032
Kunyao Chen, Cheolhong An, Truong Q. Nguyen
Rasterization bridges 3D meshes of a scene and 2D visual appearance on different viewpoints. It plays a vital role in vision and graphics area. Many researches focus on designing a differentiable rasterization and make it compatible with current learning-based frameworks. Although some global-gradient methods achieve promising results, they still ignore one substantial issue existing in most of the situations that the series of 2D silhouettes may not precisely represent the underlying 3D object. To directly tackle this problem, we propose a screen-space regularization method. Unlike the common geometric regularization, our method targets the unbalanced deformation due to the limited viewpoints. By applying the regularization to both multi-view deformation and single-view reconstruction tasks, the proposed method can significantly enhance the visual appearance for the results of a local-gradient differentiable rasterizer, i.e. reducing the visual hull redundancy. Comparing to the state-of-the-art global-gradient method, the proposed method achieves better numerical results with much lower complexity.
{"title":"Screen-space Regularization on Differentiable Rasterization","authors":"Kunyao Chen, Cheolhong An, Truong Q. Nguyen","doi":"10.1109/3DV50981.2020.00032","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00032","url":null,"abstract":"Rasterization bridges 3D meshes of a scene and 2D visual appearance on different viewpoints. It plays a vital role in vision and graphics area. Many researches focus on designing a differentiable rasterization and make it compatible with current learning-based frameworks. Although some global-gradient methods achieve promising results, they still ignore one substantial issue existing in most of the situations that the series of 2D silhouettes may not precisely represent the underlying 3D object. To directly tackle this problem, we propose a screen-space regularization method. Unlike the common geometric regularization, our method targets the unbalanced deformation due to the limited viewpoints. By applying the regularization to both multi-view deformation and single-view reconstruction tasks, the proposed method can significantly enhance the visual appearance for the results of a local-gradient differentiable rasterizer, i.e. reducing the visual hull redundancy. Comparing to the state-of-the-art global-gradient method, the proposed method achieves better numerical results with much lower complexity.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114371941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00085
Xia Chen, Jianren Wang, David Held, M. Hebert
Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure. Semantics come from the appearance and context of objects to the sensor, while geometric structure is the actual 3D shape of point clouds. Most detectors on LiDAR point clouds focus only on analyzing the geometric structure of objects in real 3D space. Unlike previous works, we propose to learn both semantic feature and geometric structure via a unified multi-view framework. Our method exploits the nature of LiDAR scans – 2D range images, and applies well-studied 2D convolutions to extract semantic features. By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin. The methodology of combining semantic and geometric features provides a unique perspective of looking at the problems in real-world 3D point cloud detection.
{"title":"PanoNet3D: Combining Semantic and Geometric Understanding for LiDAR Point Cloud Detection","authors":"Xia Chen, Jianren Wang, David Held, M. Hebert","doi":"10.1109/3DV50981.2020.00085","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00085","url":null,"abstract":"Visual data in autonomous driving perception, such as camera image and LiDAR point cloud, can be interpreted as a mixture of two aspects: semantic feature and geometric structure. Semantics come from the appearance and context of objects to the sensor, while geometric structure is the actual 3D shape of point clouds. Most detectors on LiDAR point clouds focus only on analyzing the geometric structure of objects in real 3D space. Unlike previous works, we propose to learn both semantic feature and geometric structure via a unified multi-view framework. Our method exploits the nature of LiDAR scans – 2D range images, and applies well-studied 2D convolutions to extract semantic features. By fusing semantic and geometric features, our method outperforms state-of-the-art approaches in all categories by a large margin. The methodology of combining semantic and geometric features provides a unique perspective of looking at the problems in real-world 3D point cloud detection.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"327 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127572096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00092
Zhuo Chen, Peilin Liu, Fei Wen, Jun Wang, R. Ying
Time-of-flight (ToF) sensors are vulnerable to motion blur in the presence of moving objects. This is due to the principle of ToF camera that it estimates depth from the phase-shift between emitted and received modulated signals. And the phase-shift is measured by four sequential phase-shifted images, which is assumed to be consistent in an integration time. However, object motion would give rise to disparity among the four phase-shifted images, contributing to unreliable depth measurement. In this paper, we propose a novel method that is capable of aligning the four phase-shifted images through investigating the electronic value of each pixel in the phase images. It consists of two steps, motion detecting and deblurring. Furthermore, a refinement utilizing an additional group of phase-shifted images is adopted to further improve the accuracy of depth measurement. Experiment results on a new elaborated dataset with ground-truth demonstrate that the proposed method compares favorably over existing methods in both accuracy and runtime. Particularly, the new method can achieve the best accuracy while being computationally efficient that can support real-time running.
{"title":"Restoration of Motion Blur in Time-of-Flight Depth Image Using Data Alignment","authors":"Zhuo Chen, Peilin Liu, Fei Wen, Jun Wang, R. Ying","doi":"10.1109/3DV50981.2020.00092","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00092","url":null,"abstract":"Time-of-flight (ToF) sensors are vulnerable to motion blur in the presence of moving objects. This is due to the principle of ToF camera that it estimates depth from the phase-shift between emitted and received modulated signals. And the phase-shift is measured by four sequential phase-shifted images, which is assumed to be consistent in an integration time. However, object motion would give rise to disparity among the four phase-shifted images, contributing to unreliable depth measurement. In this paper, we propose a novel method that is capable of aligning the four phase-shifted images through investigating the electronic value of each pixel in the phase images. It consists of two steps, motion detecting and deblurring. Furthermore, a refinement utilizing an additional group of phase-shifted images is adopted to further improve the accuracy of depth measurement. Experiment results on a new elaborated dataset with ground-truth demonstrate that the proposed method compares favorably over existing methods in both accuracy and runtime. Particularly, the new method can achieve the best accuracy while being computationally efficient that can support real-time running.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00062
Brandon Yushan Feng, Wangjue Yao, Zhe-Yu Liu, A. Varshney
While 360° images are becoming ubiquitous due to popularity of panoramic content, they cannot directly work with most of the existing depth estimation techniques developed for perspective images. In this paper, we present a deep-learning-based framework of estimating depth from 360° images. We present an adaptive depth refinement procedure that refines depth estimates using normal estimates and pixel-wise uncertainty scores. We introduce double quaternion approximation to combine the loss of the joint estimation of depth and surface normal. Furthermore, we use the double quaternion formulation to also measure stereo consistency between the horizontally displaced depth maps, leading to a new loss function for training a depth estimation CNN. Results show that the new double-quaternion-based loss and the adaptive depth refinement procedure lead to better network performance. Our proposed method can be used with monocular as well as stereo images. When evaluated on several datasets, our method surpasses state-of-the-art methods on most metrics.
{"title":"Deep Depth Estimation on 360° Images with a Double Quaternion Loss","authors":"Brandon Yushan Feng, Wangjue Yao, Zhe-Yu Liu, A. Varshney","doi":"10.1109/3DV50981.2020.00062","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00062","url":null,"abstract":"While 360° images are becoming ubiquitous due to popularity of panoramic content, they cannot directly work with most of the existing depth estimation techniques developed for perspective images. In this paper, we present a deep-learning-based framework of estimating depth from 360° images. We present an adaptive depth refinement procedure that refines depth estimates using normal estimates and pixel-wise uncertainty scores. We introduce double quaternion approximation to combine the loss of the joint estimation of depth and surface normal. Furthermore, we use the double quaternion formulation to also measure stereo consistency between the horizontally displaced depth maps, leading to a new loss function for training a depth estimation CNN. Results show that the new double-quaternion-based loss and the adaptive depth refinement procedure lead to better network performance. Our proposed method can be used with monocular as well as stereo images. When evaluated on several datasets, our method surpasses state-of-the-art methods on most metrics.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126449920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00038
Matthieu Zins, Gilles Simon, M. Berger
In this paper, we propose a method for coarse camera pose computation which is robust to viewing conditions and does not require a detailed model of the scene. This method meets the growing need of easy deployment of robotics or augmented reality applications in any environments, especially those for which no accurate 3D model nor huge amount of ground truth data are available. It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions. Previous works have also shown that abstracting the geometry of a scene of objects by an ellipsoid cloud allows to compute the camera pose accurately enough for various application needs. Though promising, these approaches use the ellipses fitted to the detection bounding boxes as an approximation of the imaged objects. In this paper, we go one step further and propose a learning-based method which detects improved elliptic approximations of objects which are coherent with the 3D ellipsoid in terms of perspective projection. Experiments prove that the accuracy of the computed pose significantly increases thanks to our method and is more robust to the variability of the boundaries of the detection boxes. This is achieved with very little effort in terms of training data acquisition – a few hundred calibrated images of which only three need manual object annotation.
{"title":"3D-Aware Ellipse Prediction for Object-Based Camera Pose Estimation","authors":"Matthieu Zins, Gilles Simon, M. Berger","doi":"10.1109/3DV50981.2020.00038","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00038","url":null,"abstract":"In this paper, we propose a method for coarse camera pose computation which is robust to viewing conditions and does not require a detailed model of the scene. This method meets the growing need of easy deployment of robotics or augmented reality applications in any environments, especially those for which no accurate 3D model nor huge amount of ground truth data are available. It exploits the ability of deep learning techniques to reliably detect objects regardless of viewing conditions. Previous works have also shown that abstracting the geometry of a scene of objects by an ellipsoid cloud allows to compute the camera pose accurately enough for various application needs. Though promising, these approaches use the ellipses fitted to the detection bounding boxes as an approximation of the imaged objects. In this paper, we go one step further and propose a learning-based method which detects improved elliptic approximations of objects which are coherent with the 3D ellipsoid in terms of perspective projection. Experiments prove that the accuracy of the computed pose significantly increases thanks to our method and is more robust to the variability of the boundaries of the detection boxes. This is achieved with very little effort in terms of training data acquisition – a few hundred calibrated images of which only three need manual object annotation.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132938697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00096
Ruizhe Wang, Chih-Fan Chen, Hao Peng, Xudong Liu, Xin Li
We present an approach to efficiently learn an accurate and complete 3D face model from a single image. Previous methods heavily rely on 3D Morphable Models to populate the facial shape space as well as an over-simplified shading model for image formulation. By contrast, our method directly augments a large set of 3D faces from a compact collection of facial scans and employs a high-quality rendering engine to synthesize the corresponding photo-realistic facial images. We first use a deep neural network to regress vertex coordinates from the given image and then refine them by a non-rigid deformation process to more accurately capture local shape similarity. We have conducted extensive experiments to demonstrate the superiority of the proposed approach on 2D-to-3D facial shape inference, especially its excellent generalization property on real-world selfie images.
{"title":"Learning 3D Faces from Photo-Realistic Facial Synthesis","authors":"Ruizhe Wang, Chih-Fan Chen, Hao Peng, Xudong Liu, Xin Li","doi":"10.1109/3DV50981.2020.00096","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00096","url":null,"abstract":"We present an approach to efficiently learn an accurate and complete 3D face model from a single image. Previous methods heavily rely on 3D Morphable Models to populate the facial shape space as well as an over-simplified shading model for image formulation. By contrast, our method directly augments a large set of 3D faces from a compact collection of facial scans and employs a high-quality rendering engine to synthesize the corresponding photo-realistic facial images. We first use a deep neural network to regress vertex coordinates from the given image and then refine them by a non-rigid deformation process to more accurately capture local shape similarity. We have conducted extensive experiments to demonstrate the superiority of the proposed approach on 2D-to-3D facial shape inference, especially its excellent generalization property on real-world selfie images.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129805881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-11-01DOI: 10.1109/3DV50981.2020.00014
Rajbir Kataria, Joseph DeGol, Derek Hoiem
A common cause of failure in structure-from-motion (SfM) is misregistration of images due to visual patterns that occur in more than one scene location. Most work to solve this problem ignores image matches that are inconsistent according to the statistics of the tracks graph, but these methods often need to be tuned for each dataset and can lead to reduced completeness of normally good reconstructions when valid matches are removed. Our key idea is to address ambiguity directly in the reconstruction process by using only a subset of reliable matches to determine resectioning order and the initial pose. We also introduce a new measure of similarity that adjusts the influence of feature matches based on their track length. We show this improves reconstruction robustness for two state-of-the-art SfM algorithms on many diverse datasets.
{"title":"Improving Structure from Motion with Reliable Resectioning","authors":"Rajbir Kataria, Joseph DeGol, Derek Hoiem","doi":"10.1109/3DV50981.2020.00014","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00014","url":null,"abstract":"A common cause of failure in structure-from-motion (SfM) is misregistration of images due to visual patterns that occur in more than one scene location. Most work to solve this problem ignores image matches that are inconsistent according to the statistics of the tracks graph, but these methods often need to be tuned for each dataset and can lead to reduced completeness of normally good reconstructions when valid matches are removed. Our key idea is to address ambiguity directly in the reconstruction process by using only a subset of reliable matches to determine resectioning order and the initial pose. We also introduce a new measure of similarity that adjusts the influence of feature matches based on their track length. We show this improves reconstruction robustness for two state-of-the-art SfM algorithms on many diverse datasets.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128306207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}