首页 > 最新文献

2016 Fourth International Conference on 3D Vision (3DV)最新文献

英文 中文
Progressive 3D Modeling All the Way 渐进式3D建模所有的方式
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.11
Alex Locher, M. Havlena, L. Gool
This work proposes a method bridging the existing gap between progressive sparse 3D reconstruction (incremental Structure from Motion) and progressive point based dense 3D reconstruction (Multi-View Stereo). The presented algorithm is capable of adapting an existing dense 3D model to changes such as the addition or removal of new images, the merge of scene parts, or changes in the underlying camera calibration. The existing 3D model is transformed as consistently as possible and the structure is reused as much as possible without sacrificing the accuracy and/or completeness of the final result. A significant decrease in runtime is achieved compared to the re-computation of a new dense point cloud from scratch. We demonstrate the performance of the algorithm in various experiments on publicly available datasets of different sizes and compare it to the baseline. The work interacts seamlessly with publicly available software enabling an integrated progressive 3D modeling pipeline.
这项工作提出了一种方法,弥合了渐进稀疏3D重建(增量结构来自运动)和渐进基于点的密集3D重建(多视图立体)之间的现有差距。所提出的算法能够使现有的密集3D模型适应诸如添加或删除新图像,场景部分合并或底层相机校准变化等变化。在不牺牲最终结果的准确性和/或完整性的情况下,尽可能一致地转换现有的3D模型,并尽可能重复使用结构。与从头开始重新计算一个新的密集点云相比,显著减少了运行时间。我们在不同大小的公开可用数据集的各种实验中展示了该算法的性能,并将其与基线进行了比较。该工作与公开可用的软件无缝交互,从而实现集成的渐进式3D建模管道。
{"title":"Progressive 3D Modeling All the Way","authors":"Alex Locher, M. Havlena, L. Gool","doi":"10.1109/3DV.2016.11","DOIUrl":"https://doi.org/10.1109/3DV.2016.11","url":null,"abstract":"This work proposes a method bridging the existing gap between progressive sparse 3D reconstruction (incremental Structure from Motion) and progressive point based dense 3D reconstruction (Multi-View Stereo). The presented algorithm is capable of adapting an existing dense 3D model to changes such as the addition or removal of new images, the merge of scene parts, or changes in the underlying camera calibration. The existing 3D model is transformed as consistently as possible and the structure is reused as much as possible without sacrificing the accuracy and/or completeness of the final result. A significant decrease in runtime is achieved compared to the re-computation of a new dense point cloud from scratch. We demonstrate the performance of the algorithm in various experiments on publicly available datasets of different sizes and compare it to the baseline. The work interacts seamlessly with publicly available software enabling an integrated progressive 3D modeling pipeline.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132233795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Stereo Fusion: Combining Multiple Disparity Hypotheses with Deep-Learning 深度立体融合:多视差假设与深度学习的结合
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.22
Matteo Poggi, S. Mattoccia
Stereo matching is a popular technique to infer depth from two or more images and wealth of methods have been proposed to deal with this problem. Despite these efforts, finding accurate stereo correspondences is still an open problem. The strengths and weaknesses of existing methods are often complementary and in this paper, motivated by recent trends in this field, we exploit this fact by proposing Deep Stereo Fusion, a Convolutional Neural Network capable of combining the output of multiple stereo algorithms in order to obtain more accurate result with respect to each input disparity map. Deep Stereo Fusion process a 3D features vector, encoding both spatial and cross-algorithm information, in order to select the best disparity hypothesis among those proposed by the single stereo matchers. To the best of our knowledge, our proposal is the first i) to leverage on deep learning and ii) able to predict the optimal disparity assignments by taking only as input cue the disparity maps. This second feature makes our method suitable for deployment even when other cues (e.g., confidence) are not available such as when dealing with disparity maps provided by off-the-shelf 3D sensors. We thoroughly evaluate our proposal on the KITTI stereo benchmark with respect state-of-the-art in this field.
立体匹配是一种从两幅或多幅图像中推断深度的常用技术,已经提出了许多方法来处理这一问题。尽管有这些努力,找到精确的立体对应仍然是一个悬而未决的问题。现有方法的优点和缺点往往是互补的,在本文中,受该领域最新趋势的推动,我们通过提出深度立体融合来利用这一事实,深度立体融合是一种卷积神经网络,能够结合多种立体算法的输出,以便在每个输入视差图上获得更准确的结果。深度立体融合处理三维特征向量,同时编码空间信息和交叉算法信息,以便在单个立体匹配器提出的假设中选择最佳的视差假设。据我们所知,我们的建议是第一个i)利用深度学习和ii)能够通过仅将视差图作为输入来预测最佳视差分配。第二个特征使我们的方法适用于部署,即使在其他线索(例如,置信度)不可用时,例如处理由现成的3D传感器提供的视差图时。我们彻底评估了我们在KITTI立体基准上的建议,并尊重该领域的最新技术。
{"title":"Deep Stereo Fusion: Combining Multiple Disparity Hypotheses with Deep-Learning","authors":"Matteo Poggi, S. Mattoccia","doi":"10.1109/3DV.2016.22","DOIUrl":"https://doi.org/10.1109/3DV.2016.22","url":null,"abstract":"Stereo matching is a popular technique to infer depth from two or more images and wealth of methods have been proposed to deal with this problem. Despite these efforts, finding accurate stereo correspondences is still an open problem. The strengths and weaknesses of existing methods are often complementary and in this paper, motivated by recent trends in this field, we exploit this fact by proposing Deep Stereo Fusion, a Convolutional Neural Network capable of combining the output of multiple stereo algorithms in order to obtain more accurate result with respect to each input disparity map. Deep Stereo Fusion process a 3D features vector, encoding both spatial and cross-algorithm information, in order to select the best disparity hypothesis among those proposed by the single stereo matchers. To the best of our knowledge, our proposal is the first i) to leverage on deep learning and ii) able to predict the optimal disparity assignments by taking only as input cue the disparity maps. This second feature makes our method suitable for deployment even when other cues (e.g., confidence) are not available such as when dealing with disparity maps provided by off-the-shelf 3D sensors. We thoroughly evaluate our proposal on the KITTI stereo benchmark with respect state-of-the-art in this field.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122213297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Video Depth-from-Defocus 视频Depth-from-Defocus
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.46
Hyeongwoo Kim, Christian Richardt, C. Theobalt
Many compelling video post-processing effects, in particular aesthetic focus editing and refocusing effects, are feasible if per-frame depth information is available. Existing computational methods to capture RGB and depth either purposefully modify the optics (coded aperture, light-field imaging), or employ active RGB-D cameras. Since these methods are less practical for users with normal cameras, we present an algorithm to capture all-in-focus RGB-D video of dynamic scenes with an unmodified commodity video camera. Our algorithm turns the often unwanted defocus blur into a valuable signal. The input to our method is a video in which the focus plane is continuously moving back and forth during capture, and thus defocus blur is provoked and strongly visible. This can be achieved by manually turning the focus ring of the lens during recording. The core algorithmic ingredient is a new video-based depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video, and the focus distance for each frame. We extensively evaluate our approach, and show that it enables compelling video post-processing effects, such as different types of refocusing.
如果每帧深度信息可用,许多引人注目的视频后处理效果,特别是美学焦点编辑和重新聚焦效果是可行的。现有的捕捉RGB和深度的计算方法要么有目的地修改光学(编码孔径,光场成像),要么采用主动RGB- d相机。由于这些方法对于使用普通摄像机的用户不太实用,因此我们提出了一种使用未经修改的商用摄像机捕获动态场景全焦RGB-D视频的算法。我们的算法将通常不需要的散焦模糊转化为有价值的信号。我们的方法的输入是一个视频,在这个视频中,焦平面在捕捉过程中不断地前后移动,因此引起了离焦模糊,并且非常明显。这可以通过在记录过程中手动转动镜头的对焦环来实现。算法的核心部分是一种新的基于视频的离焦深度算法,该算法计算时空相干深度图,去模糊全焦视频,以及每帧的焦距。我们广泛评估了我们的方法,并表明它可以实现引人注目的视频后处理效果,例如不同类型的重新对焦。
{"title":"Video Depth-from-Defocus","authors":"Hyeongwoo Kim, Christian Richardt, C. Theobalt","doi":"10.1109/3DV.2016.46","DOIUrl":"https://doi.org/10.1109/3DV.2016.46","url":null,"abstract":"Many compelling video post-processing effects, in particular aesthetic focus editing and refocusing effects, are feasible if per-frame depth information is available. Existing computational methods to capture RGB and depth either purposefully modify the optics (coded aperture, light-field imaging), or employ active RGB-D cameras. Since these methods are less practical for users with normal cameras, we present an algorithm to capture all-in-focus RGB-D video of dynamic scenes with an unmodified commodity video camera. Our algorithm turns the often unwanted defocus blur into a valuable signal. The input to our method is a video in which the focus plane is continuously moving back and forth during capture, and thus defocus blur is provoked and strongly visible. This can be achieved by manually turning the focus ring of the lens during recording. The core algorithmic ingredient is a new video-based depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video, and the focus distance for each frame. We extensively evaluate our approach, and show that it enables compelling video post-processing effects, such as different types of refocusing.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130078926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Tracking Deformable Surfaces That Undergo Topological Changes Using an RGB-D Camera 使用RGB-D相机跟踪经过拓扑变化的可变形表面
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.42
Aggeliki Tsoli, Antonis A. Argyros
We present a method for 3D tracking of deformable surfaces with dynamic topology, for instance a paper that undergoes cutting or tearing. Existing template-based methods assume a template of fixed topology. Thus, they fail in tracking deformable objects that undergo topological changes. In our work, we employ a dynamic template (3D mesh) whose topology evolves based on the topological changes of the observed geometry. Our tracking framework deforms the defined template based on three types of constraints: (a) the surface of the template has to be registered to the 3D shape of the tracked surface, (b) the template deformation should respect feature (SIFT) correspondences between selected pairs of frames, and (c) the lengths of the template edges should be preserved. The latter constraint is relaxed when an edge is found to lie on a "geometric gap", that is, when a significant depth discontinuity is detected along this edge. The topology of the template is updated on the fly by removing overstretched edges that lie on a geometric gap. The proposed method has been evaluated quantitatively and qualitatively in both synthetic and real sequences of monocular RGB-D views of surfaces that undergo various types of topological changes. The obtained results show that our approach tracks effectively objects with evolving topology and outperforms state of the art methods in tracking accuracy.
我们提出了一种具有动态拓扑结构的可变形表面的三维跟踪方法,例如经过切割或撕裂的纸张。现有的基于模板的方法假设模板具有固定的拓扑结构。因此,它们无法跟踪经历拓扑变化的可变形物体。在我们的工作中,我们采用了一个动态模板(3D网格),其拓扑结构根据观察到的几何结构的拓扑变化而变化。我们的跟踪框架基于三种类型的约束使定义的模板变形:(a)模板的表面必须与被跟踪表面的3D形状相匹配,(b)模板变形应尊重所选框架对之间的特征(SIFT)对应关系,以及(c)模板边缘的长度应保持不变。当发现边缘位于“几何间隙”上时,即当沿该边缘检测到明显的深度不连续时,后一种约束被放宽。模板的拓扑结构通过移除位于几何间隙上的过度拉伸的边缘来实时更新。所提出的方法已经在经过各种类型拓扑变化的表面的单目RGB-D视图的合成和真实序列中进行了定量和定性评估。结果表明,该方法能够有效地跟踪具有演化拓扑的目标,并且在跟踪精度上优于现有方法。
{"title":"Tracking Deformable Surfaces That Undergo Topological Changes Using an RGB-D Camera","authors":"Aggeliki Tsoli, Antonis A. Argyros","doi":"10.1109/3DV.2016.42","DOIUrl":"https://doi.org/10.1109/3DV.2016.42","url":null,"abstract":"We present a method for 3D tracking of deformable surfaces with dynamic topology, for instance a paper that undergoes cutting or tearing. Existing template-based methods assume a template of fixed topology. Thus, they fail in tracking deformable objects that undergo topological changes. In our work, we employ a dynamic template (3D mesh) whose topology evolves based on the topological changes of the observed geometry. Our tracking framework deforms the defined template based on three types of constraints: (a) the surface of the template has to be registered to the 3D shape of the tracked surface, (b) the template deformation should respect feature (SIFT) correspondences between selected pairs of frames, and (c) the lengths of the template edges should be preserved. The latter constraint is relaxed when an edge is found to lie on a \"geometric gap\", that is, when a significant depth discontinuity is detected along this edge. The topology of the template is updated on the fly by removing overstretched edges that lie on a geometric gap. The proposed method has been evaluated quantitatively and qualitatively in both synthetic and real sequences of monocular RGB-D views of surfaces that undergo various types of topological changes. The obtained results show that our approach tracks effectively objects with evolving topology and outperforms state of the art methods in tracking accuracy.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129798439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Coupled Functional Maps 耦合功能映射
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.49
D. Eynard, E. Rodolà, K. Glashoff, M. Bronstein
Classical formulations of the shape matching problem involve the definition of a matching cost that directly depends on the action of the desired map when applied to some input data. Such formulations are typically one-sided - they seek for a mapping from one shape to the other, but not vice versa. In this paper we consider an unbiased formulation of this problem, in which we solve simultaneously for a low-distortion map relating the two given shapes and its inverse. We phrase the problem in the spectral domain using the language of functional maps, resulting in an especially compact and efficient optimization problem. The benefits of our proposed regularization are especially evident in the scarce data setting, where we demonstrate highly competitive results with respect to the state of the art.
形状匹配问题的经典公式涉及匹配成本的定义,该成本直接取决于应用于某些输入数据时所需映射的动作。这种公式通常是片面的——它们寻求从一种形状到另一种形状的映射,而不是相反。本文考虑了该问题的一个无偏公式,同时求解了两个给定形状及其逆的低畸变映射。我们使用功能映射的语言在谱域中表达问题,从而得到一个特别紧凑和高效的优化问题。我们提出的正则化的好处在稀缺数据设置中尤其明显,在那里我们展示了相对于最先进的高度竞争的结果。
{"title":"Coupled Functional Maps","authors":"D. Eynard, E. Rodolà, K. Glashoff, M. Bronstein","doi":"10.1109/3DV.2016.49","DOIUrl":"https://doi.org/10.1109/3DV.2016.49","url":null,"abstract":"Classical formulations of the shape matching problem involve the definition of a matching cost that directly depends on the action of the desired map when applied to some input data. Such formulations are typically one-sided - they seek for a mapping from one shape to the other, but not vice versa. In this paper we consider an unbiased formulation of this problem, in which we solve simultaneously for a low-distortion map relating the two given shapes and its inverse. We phrase the problem in the spectral domain using the language of functional maps, resulting in an especially compact and efficient optimization problem. The benefits of our proposed regularization are especially evident in the scarce data setting, where we demonstrate highly competitive results with respect to the state of the art.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124233666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
A Closed-Form Bayesian Fusion Equation Using Occupancy Probabilities 基于占用概率的封闭贝叶斯融合方程
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.47
Charles T. Loop, Q. Cai, Sergio Orts, P. Chou
We present a new mathematical framework for multi-view surface reconstruction from a set of calibrated color and depth images. We estimate the occupancy probability of points in space along sight rays, and combine these estimates using a normalized product derived from Bayes' rule. The advantage of this approach is that the free space constraint is a natural consequence of the formulation, and not a separate logical operation. We present a single closed form implicit expression for the reconstructed surface in terms of the image data and camera projections, making analytic properties such as surface normals not only easy to compute, but exact. This expression can be efficiently evaluated on the GPU, making it ideal for high performance real-time applications, such as live human body capture for immersive telepresence.
我们提出了一种新的数学框架,用于从一组校准的颜色和深度图像中进行多视图表面重建。我们估计空间中点沿视线的占用概率,并使用贝叶斯规则导出的归一化乘积将这些估计结合起来。这种方法的优点是自由空间约束是公式的自然结果,而不是单独的逻辑操作。我们根据图像数据和相机投影为重建表面提供了一个单一的封闭形式隐式表达式,使表面法线等解析性质不仅易于计算,而且准确。该表达式可以在GPU上有效地进行评估,使其成为高性能实时应用的理想选择,例如用于沉浸式远程呈现的实时人体捕获。
{"title":"A Closed-Form Bayesian Fusion Equation Using Occupancy Probabilities","authors":"Charles T. Loop, Q. Cai, Sergio Orts, P. Chou","doi":"10.1109/3DV.2016.47","DOIUrl":"https://doi.org/10.1109/3DV.2016.47","url":null,"abstract":"We present a new mathematical framework for multi-view surface reconstruction from a set of calibrated color and depth images. We estimate the occupancy probability of points in space along sight rays, and combine these estimates using a normalized product derived from Bayes' rule. The advantage of this approach is that the free space constraint is a natural consequence of the formulation, and not a separate logical operation. We present a single closed form implicit expression for the reconstructed surface in terms of the image data and camera projections, making analytic properties such as surface normals not only easy to compute, but exact. This expression can be efficiently evaluated on the GPU, making it ideal for high performance real-time applications, such as live human body capture for immersive telepresence.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123026394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
3D Human Pose Estimation via Deep Learning from 2D Annotations 基于2D注释的深度学习三维人体姿态估计
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.84
Ernesto Brau, Hao Jiang
We propose a deep convolutional neural network for 3D human pose and camera estimation from monocular images that learns from 2D joint annotations. The proposed network follows the typical architecture, but contains an additional output layer which projects predicted 3D joints onto 2D, and enforces constraints on body part lengths in 3D. We further enforce pose constraints using an independently trained network that learns a prior distribution over 3D poses. We evaluate our approach on several benchmark datasets and compare against state-of-the-art approaches for 3D human pose estimation, achieving comparable performance. Additionally, we show that our approach significantly outperforms other methods in cases where 3D ground truth data is unavailable, and that our network exhibits good generalization properties.
我们提出了一种深度卷积神经网络,用于从单眼图像中学习2D联合注释的3D人体姿势和相机估计。所提出的网络遵循典型的架构,但包含一个额外的输出层,该输出层将预测的3D关节投影到2D上,并在3D中对身体部位长度进行约束。我们使用一个独立训练的网络来学习3D姿势的先验分布,进一步加强姿势约束。我们在几个基准数据集上评估了我们的方法,并与最先进的3D人体姿态估计方法进行了比较,获得了相当的性能。此外,我们表明,在三维地面真实数据不可用的情况下,我们的方法明显优于其他方法,并且我们的网络表现出良好的泛化特性。
{"title":"3D Human Pose Estimation via Deep Learning from 2D Annotations","authors":"Ernesto Brau, Hao Jiang","doi":"10.1109/3DV.2016.84","DOIUrl":"https://doi.org/10.1109/3DV.2016.84","url":null,"abstract":"We propose a deep convolutional neural network for 3D human pose and camera estimation from monocular images that learns from 2D joint annotations. The proposed network follows the typical architecture, but contains an additional output layer which projects predicted 3D joints onto 2D, and enforces constraints on body part lengths in 3D. We further enforce pose constraints using an independently trained network that learns a prior distribution over 3D poses. We evaluate our approach on several benchmark datasets and compare against state-of-the-art approaches for 3D human pose estimation, achieving comparable performance. Additionally, we show that our approach significantly outperforms other methods in cases where 3D ground truth data is unavailable, and that our network exhibits good generalization properties.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115335423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Will It Last? Learning Stable Features for Long-Term Visual Localization 这种情况会持续下去吗?学习长期视觉定位的稳定特征
Pub Date : 2016-10-01 DOI: 10.3929/ETHZ-A-010819535
Marcin Dymczyk, E. Stumm, Juan I. Nieto, R. Siegwart, Igor Gilitschenski
An increasing number of simultaneous localization and mapping (SLAM) systems are using appearance-based localization to improve the quality of pose estimates. However, with the growing time-spans and size of the areas we want to cover, appearance-based maps are often becoming too large to handle and are consisting of features that are not always reliable for localization purposes. This paper presents a method for selecting map features that are persistent over time and thus suited for long-term localization. Our methodology relies on a CNN classifier based on image patches and depth maps for recognizing which features are suitable for life-long matchability. Thus, the classifier not only considers the appearance of a feature but also takes into account its expected lifetime. As a result, our feature selection approach produces more compact maps with a high fraction of temporally-stable features compared to the current state-of-the-art, while rejecting unstable features that typically harm localization. Our approach is validated on indoor and outdoor datasets, that span over a period of several months.
越来越多的同步定位和测绘(SLAM)系统正在使用基于外观的定位来提高姿态估计的质量。然而,随着我们想要覆盖的区域的时间跨度和规模的增长,基于外观的地图通常会变得太大而难以处理,并且由不总是可靠的定位功能组成。本文提出了一种选择地图特征的方法,这些特征随着时间的推移而持续存在,因此适合于长期定位。我们的方法依赖于基于图像补丁和深度图的CNN分类器来识别哪些特征适合终身匹配。因此,分类器不仅考虑特征的外观,还考虑其预期寿命。因此,与当前最先进的方法相比,我们的特征选择方法产生了更紧凑的地图,具有更高比例的暂时稳定特征,同时拒绝了通常会损害定位的不稳定特征。我们的方法在室内和室外数据集上进行了验证,这些数据集跨越了几个月的时间。
{"title":"Will It Last? Learning Stable Features for Long-Term Visual Localization","authors":"Marcin Dymczyk, E. Stumm, Juan I. Nieto, R. Siegwart, Igor Gilitschenski","doi":"10.3929/ETHZ-A-010819535","DOIUrl":"https://doi.org/10.3929/ETHZ-A-010819535","url":null,"abstract":"An increasing number of simultaneous localization and mapping (SLAM) systems are using appearance-based localization to improve the quality of pose estimates. However, with the growing time-spans and size of the areas we want to cover, appearance-based maps are often becoming too large to handle and are consisting of features that are not always reliable for localization purposes. This paper presents a method for selecting map features that are persistent over time and thus suited for long-term localization. Our methodology relies on a CNN classifier based on image patches and depth maps for recognizing which features are suitable for life-long matchability. Thus, the classifier not only considers the appearance of a feature but also takes into account its expected lifetime. As a result, our feature selection approach produces more compact maps with a high fraction of temporally-stable features compared to the current state-of-the-art, while rejecting unstable features that typically harm localization. Our approach is validated on indoor and outdoor datasets, that span over a period of several months.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131122540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
SceneNN: A Scene Meshes Dataset with aNNotations SceneNN:一个带有注释的场景网格数据集
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.18
Binh-Son Hua, Quang-Hieu Pham, D. Nguyen, Minh-Khoi Tran, L. Yu, Sai-Kit Yeung
Several RGB-D datasets have been publicized over the past few years for facilitating research in computer vision and robotics. However, the lack of comprehensive and fine-grained annotation in these RGB-D datasets has posed challenges to their widespread usage. In this paper, we introduce SceneNN, an RGB-D scene dataset consisting of 100 scenes. All scenes are reconstructed into triangle meshes and have per-vertex and per-pixel annotation. We further enriched the dataset with fine-grained information such as axis-aligned bounding boxes, oriented bounding boxes, and object poses. We used the dataset as a benchmark to evaluate the state-of-the-art methods on relevant research problems such as intrinsic decomposition and shape completion. Our dataset and annotation tools are available at http://www.scenenn.net.
在过去的几年中,为了促进计算机视觉和机器人技术的研究,已经公布了几个RGB-D数据集。然而,在这些RGB-D数据集中缺乏全面和细粒度的注释,这给它们的广泛使用带来了挑战。本文介绍了由100个场景组成的RGB-D场景数据集SceneNN。所有场景都被重构成三角形网格,并具有逐顶点和逐像素的注释。我们进一步丰富了数据集的细粒度信息,如轴对齐的边界框,方向的边界框,和对象的姿态。我们以该数据集为基准,对固有分解和形状补全等相关研究问题的最新方法进行了评估。我们的数据集和注释工具可在http://www.scenenn.net上获得。
{"title":"SceneNN: A Scene Meshes Dataset with aNNotations","authors":"Binh-Son Hua, Quang-Hieu Pham, D. Nguyen, Minh-Khoi Tran, L. Yu, Sai-Kit Yeung","doi":"10.1109/3DV.2016.18","DOIUrl":"https://doi.org/10.1109/3DV.2016.18","url":null,"abstract":"Several RGB-D datasets have been publicized over the past few years for facilitating research in computer vision and robotics. However, the lack of comprehensive and fine-grained annotation in these RGB-D datasets has posed challenges to their widespread usage. In this paper, we introduce SceneNN, an RGB-D scene dataset consisting of 100 scenes. All scenes are reconstructed into triangle meshes and have per-vertex and per-pixel annotation. We further enriched the dataset with fine-grained information such as axis-aligned bounding boxes, oriented bounding boxes, and object poses. We used the dataset as a benchmark to evaluate the state-of-the-art methods on relevant research problems such as intrinsic decomposition and shape completion. Our dataset and annotation tools are available at http://www.scenenn.net.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121681100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 272
X-Tag: A Fiducial Tag for Flexible and Accurate Bundle Adjustment X-Tag:灵活准确的束调整基准标签
Pub Date : 2016-10-01 DOI: 10.1109/3DV.2016.65
Tolga Birdal, Ievgeniia Dobryden, Slobodan Ilic
In this paper we design a novel planar 2D fiducial marker and develop fast detection algorithm aiming easy camera calibration and precise 3D reconstruction at the marker locations via the bundle adjustment. Even though an abundance of planar fiducial markers have been made and used in various tasks, none of them has properties necessary to solve the aforementioned tasks. Our marker, X-tag, enjoys a novel design, coupled with very efficient and robust detection scheme, resulting in a reduced number of false positives. This is achieved by constructing markers with random circular features in the image domain and encoding them using two true perspective invariants: cross-ratios and intersection preservation constraints. To detect the markers, we developed an effective search scheme, similar to Geometric Hashing and Hough Voting, in which the marker decoding is cast as a retrieval problem. We apply our system to the task of camera calibration and bundle adjustment. With qualitative and quantitative experiments, we demonstrate the robustness and accuracy of X-tag in spite of blur, noise, perspective and radial distortions, and showcase camera calibration, bundle adjustment and 3d fusion of depth data from precise extrinsic camera poses.
本文设计了一种新型平面二维基准标记,并开发了一种快速检测算法,旨在通过束平差实现标记位置的简单摄像机标定和精确三维重建。尽管在各种任务中已经制作并使用了大量的平面基准标记,但它们都不具备解决上述任务所需的属性。我们的标记,X-tag,具有新颖的设计,加上非常高效和强大的检测方案,从而减少了误报的数量。这是通过在图像域中构造具有随机圆形特征的标记并使用两个真正的透视不变量:交叉比率和相交保存约束对它们进行编码来实现的。为了检测标记,我们开发了一种有效的搜索方案,类似于几何哈希和霍夫投票,其中标记解码被视为检索问题。我们将该系统应用于摄像机标定和束平差任务。通过定性和定量实验,我们证明了x标签在模糊、噪声、透视和径向畸变下的鲁棒性和准确性,并展示了相机校准、束调整和来自精确外部相机姿态的深度数据的三维融合。
{"title":"X-Tag: A Fiducial Tag for Flexible and Accurate Bundle Adjustment","authors":"Tolga Birdal, Ievgeniia Dobryden, Slobodan Ilic","doi":"10.1109/3DV.2016.65","DOIUrl":"https://doi.org/10.1109/3DV.2016.65","url":null,"abstract":"In this paper we design a novel planar 2D fiducial marker and develop fast detection algorithm aiming easy camera calibration and precise 3D reconstruction at the marker locations via the bundle adjustment. Even though an abundance of planar fiducial markers have been made and used in various tasks, none of them has properties necessary to solve the aforementioned tasks. Our marker, X-tag, enjoys a novel design, coupled with very efficient and robust detection scheme, resulting in a reduced number of false positives. This is achieved by constructing markers with random circular features in the image domain and encoding them using two true perspective invariants: cross-ratios and intersection preservation constraints. To detect the markers, we developed an effective search scheme, similar to Geometric Hashing and Hough Voting, in which the marker decoding is cast as a retrieval problem. We apply our system to the task of camera calibration and bundle adjustment. With qualitative and quantitative experiments, we demonstrate the robustness and accuracy of X-tag in spite of blur, noise, perspective and radial distortions, and showcase camera calibration, bundle adjustment and 3d fusion of depth data from precise extrinsic camera poses.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123982928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2016 Fourth International Conference on 3D Vision (3DV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1