2020 International Conference on 3D Vision (3DV)最新文献

英文中文

Learning to Guide Local Feature Matches

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-21 DOI: 10.1109/3DV50981.2020.00123

François Darmon, Mathieu Aubry, P. Monasse

We tackle the problem of finding accurate and robust keypoint correspondences between images. We propose a learning-based approach to guide local feature matches via a learned approximate image matching. Our approach can boost the results of SIFT to a level similar to state-of-the-art deep descriptors, such as Superpoint, ContextDesc, or D2-Net and can improve performance for these descriptors. We introduce and study different levels of supervision to learn coarse correspondences. In particular, we show that weak supervision from epipolar geometry leads to performances higher than the stronger but more biased point level supervision and is a clear improvement over weak image level supervision. We demonstrate the benefits of our approach in a variety of conditions by evaluating our guided keypoint correspondences for localization of internet images on the YFCC100M dataset and indoor images on the SUN3D dataset, for robust localization on the Aachen day-night benchmark and for 3D reconstruction in challenging conditions using the LTLL historical image data.

我们解决了在图像之间找到准确和鲁棒的关键点对应的问题。我们提出了一种基于学习的方法，通过学习的近似图像匹配来指导局部特征匹配。我们的方法可以将SIFT的结果提升到类似于最先进的深度描述符的水平，例如Superpoint、ContextDesc或D2-Net，并且可以提高这些描述符的性能。我们引入和研究不同层次的监督来学习粗对应。特别地，我们证明了来自极几何的弱监督比更强但更有偏差的点级监督的性能更高，并且是对弱图像级监督的明显改进。我们通过评估YFCC100M数据集上的互联网图像本地化和SUN3D数据集上的室内图像本地化的指导关键点对应关系，在亚琛昼夜基准上进行稳健定位，以及在具有挑战性的条件下使用LTLL历史图像数据进行3D重建，从而展示了我们的方法在各种条件下的优势。

引用次数: 7

LCD – Line Clustering and Description for Place Recognition 位置识别的LCD线聚类与描述

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-21 DOI: 10.1109/3DV50981.2020.00101

Felix Taubner, Florian Tschopp, Tonci Novkovic, R. Siegwart, Fadri Furrer

Current research on visual place recognition mostly focuses on aggregating local visual features of an image into a single vector representation. Therefore, high-level information such as the geometric arrangement of the features is typically lost. In this paper, we introduce a novel learning-based approach to place recognition, using RGB-D cameras and line clusters as visual and geometric features. We state the place recognition problem as a problem of recognizing clusters of lines instead of individual patches, thus maintaining structural information. In our work, line clusters are defined as lines that make up individual objects, hence our place recognition approach can be understood as object recognition. 3D line segments are detected in RGB-D images using state-of-the-art techniques. We present a neural network architecture based on the attention mechanism for frame-wise line clustering. A similar neural network is used for the description of these clusters with a compact embedding of 128 floating point numbers, trained with triplet loss on training data obtained from the InteriorNet dataset. We show experiments on a large number of indoor scenes and compare our method with the bag-of-words image-retrieval approach using SIFT and SuperPoint features and the global descriptor NetVLAD. Trained only on synthetic data, our approach generalizes well to real-world data captured with Kinect sensors, while also providing information about the geometric arrangement of instances.

目前的视觉位置识别研究主要集中在将图像的局部视觉特征聚合到单个向量表示中。因此，诸如特征的几何排列等高级信息通常会丢失。在本文中，我们引入了一种新的基于学习的位置识别方法，使用RGB-D相机和线簇作为视觉和几何特征。我们将位置识别问题描述为识别线条集群而不是单个斑块的问题，从而保持结构信息。在我们的工作中，线簇被定义为构成单个物体的线，因此我们的位置识别方法可以理解为物体识别。使用最先进的技术在RGB-D图像中检测3D线段。我们提出了一种基于注意机制的神经网络结构，用于逐帧线聚类。一个类似的神经网络被用于描述这些集群，其中包含128个浮点数的紧凑嵌入，在从interornet数据集获得的训练数据上进行三重损失训练。我们展示了大量室内场景的实验，并将我们的方法与使用SIFT和SuperPoint特征以及全局描述符NetVLAD的词袋图像检索方法进行了比较。我们的方法只对合成数据进行训练，可以很好地推广到用Kinect传感器捕获的真实世界数据，同时还提供有关实例几何排列的信息。

{"title":"LCD – Line Clustering and Description for Place Recognition","authors":"Felix Taubner, Florian Tschopp, Tonci Novkovic, R. Siegwart, Fadri Furrer","doi":"10.1109/3DV50981.2020.00101","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00101","url":null,"abstract":"Current research on visual place recognition mostly focuses on aggregating local visual features of an image into a single vector representation. Therefore, high-level information such as the geometric arrangement of the features is typically lost. In this paper, we introduce a novel learning-based approach to place recognition, using RGB-D cameras and line clusters as visual and geometric features. We state the place recognition problem as a problem of recognizing clusters of lines instead of individual patches, thus maintaining structural information. In our work, line clusters are defined as lines that make up individual objects, hence our place recognition approach can be understood as object recognition. 3D line segments are detected in RGB-D images using state-of-the-art techniques. We present a neural network architecture based on the attention mechanism for frame-wise line clustering. A similar neural network is used for the description of these clusters with a compact embedding of 128 floating point numbers, trained with triplet loss on training data obtained from the InteriorNet dataset. We show experiments on a large number of indoor scenes and compare our method with the bag-of-words image-retrieval approach using SIFT and SuperPoint features and the global descriptor NetVLAD. Trained only on synthetic data, our approach generalizes well to real-world data captured with Kinect sensors, while also providing information about the geometric arrangement of instances.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132059247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

MaskNet: A Fully-Convolutional Network to Estimate Inlier Points MaskNet:一个估计内线点的全卷积网络

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-19 DOI: 10.1109/3DV50981.2020.00113

Vinit Sarode, Animesh Dhagat, Rangaprasad Arun Srivatsan, N. Zevallos, S. Lucey, H. Choset

Point clouds have grown in importance in the way computers perceive the world. From LIDAR sensors in autonomous cars and drones to the time of flight and stereo vision systems in our phones, point clouds are everywhere. Despite their ubiquity, point clouds in the real world are often missing points because of sensor limitations or occlusions, or contain extraneous points from sensor noise or artifacts. These problems challenge algorithms that require computing correspondences between a pair of point clouds. Therefore, this paper presents a fully-convolutional neural network that identifies which points in one point cloud are most similar (inliers) to the points in another. We show improvements in learning-based and classical point cloud registration approaches when retrofitted with our network. We demonstrate these improvements on synthetic and real-world datasets. Finally, our network produces impressive results on test datasets that were unseen during training, thus exhibiting generalizability. Code and videos are available at https://github.com/vinits5/masknet

点云在计算机感知世界的方式中越来越重要。从自动驾驶汽车和无人机上的激光雷达传感器到我们手机上的飞行时间和立体视觉系统，点云无处不在。尽管点云无处不在，但在现实世界中，由于传感器的限制或遮挡，点云经常丢失点，或者包含来自传感器噪声或伪影的无关点。这些问题挑战了需要计算一对点云之间对应关系的算法。因此，本文提出了一个全卷积神经网络来识别一个点云中哪些点与另一个点云中哪些点最相似(内线)。当使用我们的网络进行改造时，我们展示了基于学习和经典点云配准方法的改进。我们在合成数据集和真实数据集上展示了这些改进。最后，我们的网络在训练过程中看不到的测试数据集上产生了令人印象深刻的结果，从而展示了可泛化性。代码和视频可在https://github.com/vinits5/masknet上获得

引用次数: 18

GAMesh: Guided and Augmented Meshing for Deep Point Networks 游戏:深度点网络的引导和增强网格

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-19 DOI: 10.1109/3DV50981.2020.00080

Nitin Agarwal, M. Gopi

We present a new meshing algorithm called guided and augmented meshing, GAMesh, which uses a mesh prior to generate a surface for the output points of a point network. By projecting the output points onto this prior and simplifying the resulting mesh, GAMesh ensures a surface with the same topology as the mesh prior but whose geometric fidelity is controlled by the point network. This makes GAMesh independent of both the density and distribution of the output points, a common artifact in traditional surface reconstruction algorithms. We show that such a separation of geometry from topology can have several advantages especially in single-view shape prediction, fair evaluation of point networks and reconstructing surfaces for networks which output sparse point clouds. We further show that by training point networks with GAMesh, we can directly optimize the vertex positions to generate adaptive meshes with arbitrary topologies. Code and data are available on the project webpage1.1https://www.ics.uci.edu/∼agarwal/GAMesh

我们提出了一种新的网格划分算法，称为引导和增强网格，即GAMesh，它使用网格预先为点网络的输出点生成一个表面。通过将输出点投射到该先验上并简化生成的网格，GAMesh确保表面具有与网格先验相同的拓扑结构，但其几何保真度由点网络控制。这使得GAMesh独立于输出点的密度和分布，这是传统表面重建算法中常见的工件。我们证明了这种几何与拓扑的分离可以有几个优点，特别是在单视图形状预测，点网络的公平评估和输出稀疏点云的网络的表面重建方面。我们进一步证明，通过使用GAMesh训练点网络，我们可以直接优化顶点位置以生成任意拓扑的自适应网格。代码和数据可在项目网页1.1https://www.ics.uci.edu/ ~ agarwal/GAMesh上获得

引用次数: 4

Graphite: Graph-Induced Feature Extraction for Point Cloud Registration 石墨:用于点云配准的图形诱导特征提取

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-18 DOI: 10.1109/3DV50981.2020.00034

Mahdi Saleh, Shervin Dehghani, Benjamin Busam, N. Navab, Federico Tombari

3D Point clouds are a rich source of information that enjoy growing popularity in the vision community. However, due to the sparsity of their representation, learning models based on large point clouds is still a challenge. In this work, we introduce Graphite, a GRAPH-Induced feaTure Extraction pipeline, a simple yet powerful feature transform and keypoint detector. Graphite enables intensive down-sampling of point clouds with keypoint detection accompanied by a descriptor. We construct a generic graph-based learning scheme to describe point cloud regions and extract salient points. To this end, we take advantage of 6D pose information and metric learning to learn robust descriptions and keypoints across different scans. We Reformulate the 3D keypoint pipeline with graph neural networks which allow efficient processing of the point set while boosting its descriptive power which ultimately results in more accurate 3D registrations. We demonstrate our lightweight descriptor on common 3D descriptor matching and point cloud registration benchmarks [76], [71] and achieve comparable results with the state of the art. Describing 100 patches of a point cloud and detecting their keypoints takes only 0.018 seconds with our proposed network.

3D点云是一个丰富的信息源，在视觉社区中越来越受欢迎。然而，由于其表示的稀疏性，基于大型点云的学习模型仍然是一个挑战。在这项工作中，我们介绍了石墨，一个图形诱导的特征提取管道，一个简单而强大的特征变换和关键点检测器。石墨可以对点云进行密集的下采样，并附带一个描述符进行关键点检测。我们构建了一个通用的基于图的学习方案来描述点云区域并提取突出点。为此，我们利用6D姿态信息和度量学习来学习跨不同扫描的鲁棒描述和关键点。我们用图神经网络重新制定了3D关键点管道，它允许对点集进行有效处理，同时提高其描述能力，最终导致更准确的3D配准。我们在常见的3D描述符匹配和点云配准基准上展示了我们的轻量级描述符[76]，[71]，并获得了与当前技术水平相当的结果。利用我们提出的网络，描述一个点云的100个补丁并检测它们的关键点只需要0.018秒。

{"title":"Graphite: Graph-Induced Feature Extraction for Point Cloud Registration","authors":"Mahdi Saleh, Shervin Dehghani, Benjamin Busam, N. Navab, Federico Tombari","doi":"10.1109/3DV50981.2020.00034","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00034","url":null,"abstract":"3D Point clouds are a rich source of information that enjoy growing popularity in the vision community. However, due to the sparsity of their representation, learning models based on large point clouds is still a challenge. In this work, we introduce Graphite, a GRAPH-Induced feaTure Extraction pipeline, a simple yet powerful feature transform and keypoint detector. Graphite enables intensive down-sampling of point clouds with keypoint detection accompanied by a descriptor. We construct a generic graph-based learning scheme to describe point cloud regions and extract salient points. To this end, we take advantage of 6D pose information and metric learning to learn robust descriptions and keypoints across different scans. We Reformulate the 3D keypoint pipeline with graph neural networks which allow efficient processing of the point set while boosting its descriptive power which ultimately results in more accurate 3D registrations. We demonstrate our lightweight descriptor on common 3D descriptor matching and point cloud registration benchmarks [76], [71] and achieve comparable results with the state of the art. Describing 100 patches of a point cloud and detecting their keypoints takes only 0.018 seconds with our proposed network.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114238433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Learning Monocular Dense Depth from Events 从事件中学习单目密集深度

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-16 DOI: 10.1109/3DV50981.2020.00063

Javier Hidalgo-Carri'o, Daniel Gehrig, D. Scaramuzza

Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous ”events” instead of intensity frames. Compared to conventional image sensors, they offer significant advantages: high temporal resolution, high dynamic range, no motion blur, and much lower bandwidth. Recently, learning-based approaches have been applied to event-based data, thus unlocking their potential and making significant progress in a variety of tasks, such as monocular depth prediction. Most existing approaches use standard feed-forward architectures to generate network predictions, which do not leverage the temporal consistency presents in the event stream. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods. In particular, our method generates dense depth predictions using a monocular setup, which has not been shown previously. We pretrain our model using a new dataset containing events and depth maps recorded in the CARLA simulator. We test our method on the Multi Vehicle Stereo Event Camera Dataset (MVSEC). Quantitative experiments show up to 50% improvement in average depth error with respect to previous event-based methods. Code and dataset are available at: http://rpg.ifi.uzh.ch/e2depth

事件相机是一种新颖的传感器，它以异步“事件”流的形式输出亮度变化，而不是以强度帧的形式输出。与传统的图像传感器相比，它们具有显著的优势:高时间分辨率、高动态范围、无运动模糊和更低的带宽。最近，基于学习的方法已被应用于基于事件的数据，从而释放了它们的潜力，并在各种任务中取得了重大进展，例如单目深度预测。大多数现有的方法使用标准的前馈体系结构来生成网络预测，它不利用事件流中呈现的时间一致性。我们提出了一个循环架构来解决这个任务，并显示出比标准前馈方法有显著改进。特别是，我们的方法使用单目设置生成密集深度预测，这在以前没有显示过。我们使用包含在CARLA模拟器中记录的事件和深度图的新数据集预训练我们的模型。我们在多车立体事件相机数据集(MVSEC)上测试了我们的方法。定量实验表明，与以往基于事件的方法相比，平均深度误差提高了50%。代码和数据集可从http://rpg.ifi.uzh.ch/e2depth获得

{"title":"Learning Monocular Dense Depth from Events","authors":"Javier Hidalgo-Carri'o, Daniel Gehrig, D. Scaramuzza","doi":"10.1109/3DV50981.2020.00063","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00063","url":null,"abstract":"Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous ”events” instead of intensity frames. Compared to conventional image sensors, they offer significant advantages: high temporal resolution, high dynamic range, no motion blur, and much lower bandwidth. Recently, learning-based approaches have been applied to event-based data, thus unlocking their potential and making significant progress in a variety of tasks, such as monocular depth prediction. Most existing approaches use standard feed-forward architectures to generate network predictions, which do not leverage the temporal consistency presents in the event stream. We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods. In particular, our method generates dense depth predictions using a monocular setup, which has not been shown previously. We pretrain our model using a new dataset containing events and depth maps recorded in the CARLA simulator. We test our method on the Multi Vehicle Stereo Event Camera Dataset (MVSEC). Quantitative experiments show up to 50% improvement in average depth error with respect to previous event-based methods. Code and dataset are available at: http://rpg.ifi.uzh.ch/e2depth","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125283922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 66

Better Patch Stitching for Parametric Surface Reconstruction 面向参数曲面重构的更好的补丁拼接方法

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-14 DOI: 10.1109/3DV50981.2020.00069

Zhantao Deng, Jan Bednarík, M. Salzmann, P. Fua

Recently, parametric mappings have emerged as highly effective surface representations, yielding low reconstruction error. In particular, the latest works represent the target shape as an atlas of multiple mappings, which can closely encode object parts. Atlas representations, however, suffer from one major drawback: The individual mappings are not guaranteed to be consistent, which results in holes in the reconstructed shape or in jagged surface areas.We introduce an approach that explicitly encourages global consistency of the local mappings. To this end, we introduce two novel loss terms. The first term exploits the surface normals and requires that they remain locally consistent when estimated within and across the individual mappings. The second term further encourages better spatial configuration of the mappings by minimizing novel stitching error. We show on standard benchmarks that the use of normal consistency requirement outperforms the baselines quantitatively while enforcing better stitching leads to much better visual quality of the reconstructed objects as compared to the state-of-the-art.

近年来，参数映射作为一种高效的曲面表示形式出现，其重构误差较低。特别是，最新的作品将目标形状表示为多个映射的地图集，可以对对象部分进行紧密编码。然而，Atlas表示有一个主要缺点:不能保证单个映射是一致的，这会导致重建形状中出现孔洞或表面区域参差不齐。我们引入了一种明确鼓励局部映射的全局一致性的方法。为此，我们引入了两个新的损失术语。第一项利用表面法线，并要求它们在单个映射内部和跨映射估计时保持局部一致。第二项通过最小化新的拼接错误进一步鼓励映射的更好的空间配置。我们在标准基准测试中表明，使用正常一致性要求在定量上优于基线，同时强制更好的拼接导致重建对象的视觉质量比最先进的要好得多。

引用次数: 21

Do End-to-end Stereo Algorithms Under-utilize Information? 端到端立体算法没有充分利用信息吗?

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-14 DOI: 10.1109/3DV50981.2020.00047

Changjiang Cai, Philippos Mordohai

Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets.

用于立体匹配的深度网络通常利用2D或3D卷积编码器-解码器架构来聚合成本并规范成本量，以实现准确的视差估计。由于内容不敏感的卷积和下采样和上采样操作，这些成本聚合机制不能充分利用图像中可用的信息。视差图在遮挡边界附近受到过度平滑的影响，并且在薄结构中有错误的预测。在本文中，我们展示了如何将深度自适应滤波和可微半全局聚合集成到现有的2D和3D卷积网络中进行端到端立体匹配，从而提高了精度。这些改进是由于利用图像中的RGB信息作为动态指导匹配过程的信号，除了作为我们尝试跨图像匹配的信号之外。我们在KITTI 2015和Virtual KITTI 2数据集上展示了在将四种自适应滤波器(分割感知双边滤波、动态滤波网络、像素自适应卷积和半全局聚合)集成到其架构中后，比较四种立体网络(disnetc、GCNet、PSMNet和GANet)的广泛实验结果。我们的代码可在https://github.com/ccj5351/DAFStereoNets上获得。

{"title":"Do End-to-end Stereo Algorithms Under-utilize Information?","authors":"Changjiang Cai, Philippos Mordohai","doi":"10.1109/3DV50981.2020.00047","DOIUrl":"https://doi.org/10.1109/3DV50981.2020.00047","url":null,"abstract":"Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets.","PeriodicalId":293399,"journal":{"name":"2020 International Conference on 3D Vision (3DV)","volume":"247 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114605952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A New Distributional Ranking Loss With Uncertainty: Illustrated in Relative Depth Estimation 一种新的不确定分布排序损失:以相对深度估计为例

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-14 DOI: 10.1109/3DV50981.2020.00118

Alican Mertan, Y. Sahin, D. Duff, Gözde B. Ünal

We propose a new approach for the problem of relative depth estimation from a single image. Instead of directly regressing over depth scores, we formulate the problem as estimation of a probability distribution over depth and aim to learn the parameters of the distributions which maximize the likelihood of the given data. To train our model, we propose a new ranking loss, Distributional Loss, which tries to increase the probability of farther pixel’s depth being greater than the closer pixel’s depth. Our proposed approach allows our model to output confidence in its estimation in the form of standard deviation of the distribution. We achieve state of the art results against a number of baselines while providing confidence in our estimations. Our analysis show that estimated confidence is actually a good indicator of accuracy. We investigate the usage of confidence information in a downstream task of metric depth estimation, to increase its performance.

针对单幅图像的相对深度估计问题，提出了一种新的估计方法。我们不是直接回归深度分数，而是将问题表述为对深度概率分布的估计，目的是学习使给定数据的可能性最大化的分布参数。为了训练我们的模型，我们提出了一种新的排序损失，分布式损失，它试图增加远像素深度大于近像素深度的概率。我们提出的方法允许我们的模型以分布的标准差的形式输出其估计的置信度。我们在对我们的估计提供信心的同时，根据许多基线实现了最先进的结果。我们的分析表明，估计的置信度实际上是准确性的一个很好的指标。我们研究了置信信息在度量深度估计的下游任务中的使用，以提高其性能。

引用次数: 10

Matching-space Stereo Networks for Cross-domain Generalization 跨域泛化的匹配空间立体网络

2020 International Conference on 3D Vision (3DV)

Pub Date : 2020-10-14 DOI: 10.1109/3DV50981.2020.00046

Changjiang Cai, Matteo Poggi, S. Mattoccia, Philippos Mordohai

End-to-end deep networks represent the state of the art for stereo matching. While excelling on images framing environments similar to the training set, major drops in accuracy occur in unseen domains (e.g., when moving from synthetic to real scenes). In this paper we introduce a novel family of architectures, namely Matching-Space Networks (MS-Nets), with improved generalization properties. By replacing learning-based feature extraction from image RGB values with matching functions and confidence measures from conventional wisdom, we move the learning process from the color space to the Matching Space, avoiding over-specialization to domain specific features. Extensive experimental results on four real datasets highlight that our proposal leads to superior generalization to unseen environments over conventional deep architectures, keeping accuracy on the source domain almost unaltered. Our code is available at https://qithub.com/ccj5351/MS-Nets.

端到端深度网络代表了立体匹配的最新技术。虽然在与训练集相似的图像框架环境中表现出色，但准确率的主要下降发生在看不见的领域(例如，当从合成场景移动到真实场景时)。在本文中，我们介绍了一种新的体系结构，即匹配空间网络(MS-Nets)，具有改进的泛化特性。通过用传统智慧中的匹配函数和置信度度量取代基于学习的图像RGB值特征提取，我们将学习过程从颜色空间转移到匹配空间，避免过度专门化到特定领域的特征。在四个真实数据集上的大量实验结果表明，我们的建议比传统的深度架构对看不见的环境有更好的泛化，保持源域的准确性几乎不变。我们的代码可在https://qithub.com/ccj5351/MS-Nets上获得。

引用次数: 26

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 International Conference on 3D Vision (3DV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀