首页 > 最新文献

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Explainability Methods for Graph Convolutional Neural Networks 图卷积神经网络的可解释性方法
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.01103
Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, Heiko Hoffmann
With the growing use of graph convolutional neural networks (GCNNs) comes the need for explainability. In this paper, we introduce explainability methods for GCNNs. We develop the graph analogues of three prominent explainability methods for convolutional neural networks: contrastive gradient-based (CG) saliency maps, Class Activation Mapping (CAM), and Excitation Back-Propagation (EB) and their variants, gradient-weighted CAM (Grad-CAM) and contrastive EB (c-EB). We show a proof-of-concept of these methods on classification problems in two application domains: visual scene graphs and molecular graphs. To compare the methods, we identify three desirable properties of explanations: (1) their importance to classification, as measured by the impact of occlusions, (2) their contrastivity with respect to different classes, and (3) their sparseness on a graph. We call the corresponding quantitative metrics fidelity, contrastivity, and sparsity and evaluate them for each method. Lastly, we analyze the salient subgraphs obtained from explanations and report frequently occurring patterns.
随着图卷积神经网络(GCNNs)的应用越来越广泛,对可解释性的需求也随之而来。在本文中,我们介绍了gcnn的可解释性方法。我们开发了卷积神经网络三种突出的可解释性方法的图类似物:基于梯度的对比显著性图(CG),类激活映射(CAM)和激励反向传播(EB)及其变体,梯度加权的CAM (Grad-CAM)和对比EB (c-EB)。我们在视觉场景图和分子图两个应用领域展示了这些方法在分类问题上的概念证明。为了比较这些方法,我们确定了三个理想的解释属性:(1)它们对分类的重要性,通过遮挡的影响来衡量,(2)它们相对于不同类别的对比性,以及(3)它们在图上的稀疏性。我们将相应的定量度量称为保真度、对比性和稀疏性,并对每种方法进行评估。最后,我们分析了从解释中得到的显著子图,并报告了频繁出现的模式。
{"title":"Explainability Methods for Graph Convolutional Neural Networks","authors":"Phillip E. Pope, Soheil Kolouri, Mohammad Rostami, Charles E. Martin, Heiko Hoffmann","doi":"10.1109/CVPR.2019.01103","DOIUrl":"https://doi.org/10.1109/CVPR.2019.01103","url":null,"abstract":"With the growing use of graph convolutional neural networks (GCNNs) comes the need for explainability. In this paper, we introduce explainability methods for GCNNs. We develop the graph analogues of three prominent explainability methods for convolutional neural networks: contrastive gradient-based (CG) saliency maps, Class Activation Mapping (CAM), and Excitation Back-Propagation (EB) and their variants, gradient-weighted CAM (Grad-CAM) and contrastive EB (c-EB). We show a proof-of-concept of these methods on classification problems in two application domains: visual scene graphs and molecular graphs. To compare the methods, we identify three desirable properties of explanations: (1) their importance to classification, as measured by the impact of occlusions, (2) their contrastivity with respect to different classes, and (3) their sparseness on a graph. We call the corresponding quantitative metrics fidelity, contrastivity, and sparsity and evaluate them for each method. Lastly, we analyze the salient subgraphs obtained from explanations and report frequently occurring patterns.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"10764-10773"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83481975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 326
Residual Regression With Semantic Prior for Crowd Counting 基于语义先验的残差回归人群计数
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00416
Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, Wei Liu
Crowd counting is a challenging task due to factors such as large variations in crowdedness and severe occlusions. Although recent deep learning based counting algorithms have achieved a great progress, the correlation knowledge among samples and the semantic prior have not yet been fully exploited. In this paper, a residual regression framework is proposed for crowd counting utilizing the correlation information among samples. By incorporating such information into our network, we discover that more intrinsic characteristics can be learned by the network which thus generalizes better to unseen scenarios. Besides, we show how to effectively leverage the semantic prior to improve the performance of crowd counting. We also observe that the adversarial loss can be used to improve the quality of predicted density maps, thus leading to an improvement in crowd counting. Experiments on public datasets demonstrate the effectiveness and generalization ability of the proposed method.
由于拥挤程度的巨大变化和严重的闭塞等因素,人群计数是一项具有挑战性的任务。尽管近年来基于深度学习的计数算法取得了很大进展,但样本间的相关知识和语义先验尚未得到充分利用。本文提出了一个残差回归框架,利用样本间的相关信息进行人群计数。通过将这些信息整合到我们的网络中,我们发现网络可以学习到更多的内在特征,从而更好地推广到看不见的场景。此外,我们还展示了如何有效地利用语义先验来提高人群计数的性能。我们还观察到,对抗损失可以用来提高预测密度图的质量,从而导致人群计数的改进。在公共数据集上的实验证明了该方法的有效性和泛化能力。
{"title":"Residual Regression With Semantic Prior for Crowd Counting","authors":"Jia Wan, Wenhan Luo, Baoyuan Wu, Antoni B. Chan, Wei Liu","doi":"10.1109/CVPR.2019.00416","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00416","url":null,"abstract":"Crowd counting is a challenging task due to factors such as large variations in crowdedness and severe occlusions. Although recent deep learning based counting algorithms have achieved a great progress, the correlation knowledge among samples and the semantic prior have not yet been fully exploited. In this paper, a residual regression framework is proposed for crowd counting utilizing the correlation information among samples. By incorporating such information into our network, we discover that more intrinsic characteristics can be learned by the network which thus generalizes better to unseen scenarios. Besides, we show how to effectively leverage the semantic prior to improve the performance of crowd counting. We also observe that the adversarial loss can be used to improve the quality of predicted density maps, thus leading to an improvement in crowd counting. Experiments on public datasets demonstrate the effectiveness and generalization ability of the proposed method.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"27 1","pages":"4031-4040"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87297270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
Monocular Depth Estimation Using Relative Depth Maps 使用相对深度图的单目深度估计
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00996
Jae-Han Lee, Chang-Su Kim
We propose a novel algorithm for monocular depth estimation using relative depth maps. First, using a convolutional neural network, we estimate relative depths between pairs of regions, as well as ordinary depths, at various scales. Second, we restore relative depth maps from selectively estimated data based on the rank-1 property of pairwise comparison matrices. Third, we decompose ordinary and relative depth maps into components and recombine them optimally to reconstruct a final depth map. Experimental results show that the proposed algorithm provides the state-of-art depth estimation performance.
提出了一种利用相对深度图进行单目深度估计的新算法。首先,使用卷积神经网络,我们在不同的尺度上估计区域对之间的相对深度,以及普通深度。其次,基于两两比较矩阵的rank-1属性,从选择性估计的数据中恢复相对深度图。第三,将普通深度图和相对深度图分解为多个分量,并进行优化重组,重建最终深度图。实验结果表明,该算法具有较好的深度估计性能。
{"title":"Monocular Depth Estimation Using Relative Depth Maps","authors":"Jae-Han Lee, Chang-Su Kim","doi":"10.1109/CVPR.2019.00996","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00996","url":null,"abstract":"We propose a novel algorithm for monocular depth estimation using relative depth maps. First, using a convolutional neural network, we estimate relative depths between pairs of regions, as well as ordinary depths, at various scales. Second, we restore relative depth maps from selectively estimated data based on the rank-1 property of pairwise comparison matrices. Third, we decompose ordinary and relative depth maps into components and recombine them optimally to reconstruct a final depth map. Experimental results show that the proposed algorithm provides the state-of-art depth estimation performance.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"9721-9730"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87736584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
Deeply-Supervised Knowledge Synergy 深度监督知识协同
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00716
Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao
Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet. However, current prevailing training scheme follows the previous way of adding supervision to the last layer of the network only and propagating error information up layer-by-layer. In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. Inspired by the deeply-supervised learning scheme, we first append auxiliary supervision branches on top of certain intermediate network layers. While properly using auxiliary supervision can improve model accuracy to some degree, we go one step further to explore the possibility of utilizing the probabilistic knowledge dynamically learnt by the classifiers connected to the backbone network as a new regularization to improve the training. A novel synergy loss, which considers pairwise knowledge matching among all supervision branches, is presented. Intriguingly, it enables dense pairwise knowledge matching operations in both top-down and bottom-up directions at each training iteration, resembling a dynamic synergy process for the same task. We evaluate DKS on image classification datasets using state-of-the-art CNN architectures, and show that the models trained with it are consistently better than the corresponding counterparts. For instance, on the ImageNet classification benchmark, our ResNet-152 model outperforms the baseline model with a 1.47% margin in Top-1 accuracy. Code is available at https://github.com/sundw2014/DKS.
与先驱AlexNet相比,卷积神经网络(cnn)已经变得更加深入和复杂。然而,目前流行的训练方案沿用了之前的方法,即只在网络的最后一层增加监督,逐层传播错误信息。在本文中,我们提出了深度监督知识协同(deep -supervised Knowledge Synergy, DKS),这是一种新的方法,旨在训练具有更高泛化能力的cnn来完成图像分类任务,而不会在推理过程中引入额外的计算成本。受深度监督学习方案的启发,我们首先在某些中间网络层上附加辅助监督分支。适当地使用辅助监督可以在一定程度上提高模型的准确性,我们进一步探索了利用连接到骨干网络的分类器动态学习的概率知识作为一种新的正则化来改进训练的可能性。提出了一种考虑各监管分支间知识配对匹配的新型协同损失算法。有趣的是,在每个训练迭代中,它支持自上而下和自下而上方向上的密集成对知识匹配操作,类似于同一任务的动态协同过程。我们使用最先进的CNN架构在图像分类数据集上评估DKS,并表明使用它训练的模型始终优于相应的模型。例如,在ImageNet分类基准上,我们的ResNet-152模型在Top-1准确率上比基线模型高出1.47%。代码可从https://github.com/sundw2014/DKS获得。
{"title":"Deeply-Supervised Knowledge Synergy","authors":"Dawei Sun, Anbang Yao, Aojun Zhou, Hao Zhao","doi":"10.1109/CVPR.2019.00716","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00716","url":null,"abstract":"Convolutional Neural Networks (CNNs) have become deeper and more complicated compared with the pioneering AlexNet. However, current prevailing training scheme follows the previous way of adding supervision to the last layer of the network only and propagating error information up layer-by-layer. In this paper, we propose Deeply-supervised Knowledge Synergy (DKS), a new method aiming to train CNNs with improved generalization ability for image classification tasks without introducing extra computational cost during inference. Inspired by the deeply-supervised learning scheme, we first append auxiliary supervision branches on top of certain intermediate network layers. While properly using auxiliary supervision can improve model accuracy to some degree, we go one step further to explore the possibility of utilizing the probabilistic knowledge dynamically learnt by the classifiers connected to the backbone network as a new regularization to improve the training. A novel synergy loss, which considers pairwise knowledge matching among all supervision branches, is presented. Intriguingly, it enables dense pairwise knowledge matching operations in both top-down and bottom-up directions at each training iteration, resembling a dynamic synergy process for the same task. We evaluate DKS on image classification datasets using state-of-the-art CNN architectures, and show that the models trained with it are consistently better than the corresponding counterparts. For instance, on the ImageNet classification benchmark, our ResNet-152 model outperforms the baseline model with a 1.47% margin in Top-1 accuracy. Code is available at https://github.com/sundw2014/DKS.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"6990-6999"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90591628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance 在不同显示亮度和观看距离下预测可见图像差异
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00558
Nanyang Ye, Krzysztof Wolski, Rafał K. Mantiuk
Numerous applications require a robust metric that can predict whether image differences are visible or not. However, the accuracy of existing white-box visibility metrics, such as HDR-VDP, is often not good enough. CNN-based black-box visibility metrics have proven to be more accurate, but they cannot account for differences in viewing conditions, such as display brightness and viewing distance. In this paper, we propose a CNN-based visibility metric, which maintains the accuracy of deep network solutions and accounts for viewing conditions. To achieve this, we extend the existing dataset of locally visible differences (LocVis) with a new set of measurements, collected considering aforementioned viewing conditions. Then, we develop a hybrid model that combines white-box processing stages for modeling the effects of luminance masking and contrast sensitivity, with a black-box deep neural network. We demonstrate that the novel hybrid model can handle the change of viewing conditions correctly and outperforms state-of-the-art metrics.
许多应用程序需要一个健壮的度量来预测图像差异是否可见。然而,现有的白盒可见性指标(如HDR-VDP)的准确性往往不够好。基于cnn的黑箱可视性指标已被证明更为准确,但它们无法解释观看条件的差异,例如显示亮度和观看距离。在本文中,我们提出了一个基于cnn的可见性度量,它保持了深度网络解决方案的准确性,并考虑了观看条件。为了实现这一目标,我们使用一组新的测量值扩展了现有的局部可见差异(LocVis)数据集,这些测量值是根据上述观看条件收集的。然后,我们开发了一个混合模型,将白盒处理阶段与黑盒深度神经网络相结合,用于模拟亮度掩蔽和对比度灵敏度的影响。我们证明了新的混合模型可以正确地处理观看条件的变化,并且优于最先进的指标。
{"title":"Predicting Visible Image Differences Under Varying Display Brightness and Viewing Distance","authors":"Nanyang Ye, Krzysztof Wolski, Rafał K. Mantiuk","doi":"10.1109/CVPR.2019.00558","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00558","url":null,"abstract":"Numerous applications require a robust metric that can predict whether image differences are visible or not. However, the accuracy of existing white-box visibility metrics, such as HDR-VDP, is often not good enough. CNN-based black-box visibility metrics have proven to be more accurate, but they cannot account for differences in viewing conditions, such as display brightness and viewing distance. In this paper, we propose a CNN-based visibility metric, which maintains the accuracy of deep network solutions and accounts for viewing conditions. To achieve this, we extend the existing dataset of locally visible differences (LocVis) with a new set of measurements, collected considering aforementioned viewing conditions. Then, we develop a hybrid model that combines white-box processing stages for modeling the effects of luminance masking and contrast sensitivity, with a black-box deep neural network. We demonstrate that the novel hybrid model can handle the change of viewing conditions correctly and outperforms state-of-the-art metrics.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"53 1","pages":"5429-5437"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85621061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Learning Words by Drawing Images 通过画图来学习单词
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00213
Dídac Surís, Adrià Recasens, David Bau, David F. Harwath, James R. Glass, A. Torralba
We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.
我们提出了一个通过绘画学习的框架。我们的目标是从图像的口头描述数据集中学习口语单词和抽象视觉属性之间的对应关系。基于最近的发现,GAN表示可以被操纵来编辑生成输出中的语义概念,我们提出了一种使用GAN生成的图像来使用三重损失训练模型的新方法。为了应用该方法,我们开发了Audio CLEVRGAN,这是一个gan生成的CLEVR图像的音频描述的新数据集,我们描述了一个训练过程,该过程创建了gan生成的图像课程,该课程侧重于以特定的、信息丰富的方式对不同的图像对进行训练。训练是在没有额外监督的情况下完成的,除了口语字幕和GAN。我们发现,与之前的结果相比,利用gan生成的编辑示例的训练可以提高模型学习属性的能力。我们提出的学习框架还产生了一些模型,这些模型可以将口语与一些抽象的视觉概念(如颜色和大小)联系起来。
{"title":"Learning Words by Drawing Images","authors":"Dídac Surís, Adrià Recasens, David Bau, David F. Harwath, James R. Glass, A. Torralba","doi":"10.1109/CVPR.2019.00213","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00213","url":null,"abstract":"We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"213 1 1","pages":"2029-2038"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85642682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
BAD SLAM: Bundle Adjusted Direct RGB-D SLAM 坏SLAM:束调整直接RGB-D SLAM
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00022
Thomas Schöps, Torsten Sattler, M. Pollefeys
A key component of Simultaneous Localization and Mapping (SLAM) systems is the joint optimization of the estimated 3D map and camera trajectory. Bundle adjustment (BA) is the gold standard for this. Due to the large number of variables in dense RGB-D SLAM, previous work has focused on approximating BA. In contrast, in this paper we present a novel, fast direct BA formulation which we implement in a real-time dense RGB-D SLAM algorithm. In addition, we show that direct RGB-D SLAM systems are highly sensitive to rolling shutter, RGB and depth sensor synchronization, and calibration errors. In order to facilitate state-of-the-art research on direct RGB-D SLAM, we propose a novel, well-calibrated benchmark for this task that uses synchronized global shutter RGB and depth cameras. It includes a training set, a test set without public ground truth, and an online evaluation service. We observe that the ranking of methods changes on this dataset compared to existing ones, and our proposed algorithm outperforms all other evaluated SLAM methods. Our benchmark and our open source SLAM algorithm are available at: www.eth3d.net
同时定位与测绘(SLAM)系统的一个关键组成部分是估计的三维地图和相机轨迹的联合优化。捆绑调整(BA)是这方面的黄金标准。由于密集RGB-D SLAM中存在大量变量,以往的工作主要集中在近似BA上。相比之下,在本文中,我们提出了一种新的,快速的直接BA公式,我们在实时密集RGB-D SLAM算法中实现。此外,我们发现直接RGB- d SLAM系统对滚动快门、RGB和深度传感器同步以及校准误差高度敏感。为了促进对直接RGB- d SLAM的最新研究,我们提出了一种新的、校准良好的基准,该基准使用同步全局快门RGB和深度相机。它包括一个训练集,一个没有公开基础真理的测试集,以及一个在线评估服务。我们观察到,与现有方法相比,该数据集上方法的排名发生了变化,并且我们提出的算法优于所有其他已评估的SLAM方法。我们的基准测试和开源SLAM算法可在:www.eth3d.net上获得
{"title":"BAD SLAM: Bundle Adjusted Direct RGB-D SLAM","authors":"Thomas Schöps, Torsten Sattler, M. Pollefeys","doi":"10.1109/CVPR.2019.00022","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00022","url":null,"abstract":"A key component of Simultaneous Localization and Mapping (SLAM) systems is the joint optimization of the estimated 3D map and camera trajectory. Bundle adjustment (BA) is the gold standard for this. Due to the large number of variables in dense RGB-D SLAM, previous work has focused on approximating BA. In contrast, in this paper we present a novel, fast direct BA formulation which we implement in a real-time dense RGB-D SLAM algorithm. In addition, we show that direct RGB-D SLAM systems are highly sensitive to rolling shutter, RGB and depth sensor synchronization, and calibration errors. In order to facilitate state-of-the-art research on direct RGB-D SLAM, we propose a novel, well-calibrated benchmark for this task that uses synchronized global shutter RGB and depth cameras. It includes a training set, a test set without public ground truth, and an online evaluation service. We observe that the ranking of methods changes on this dataset compared to existing ones, and our proposed algorithm outperforms all other evaluated SLAM methods. Our benchmark and our open source SLAM algorithm are available at: www.eth3d.net","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"23 1","pages":"134-144"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85972063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 174
Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation 远距离监督质心移位:一种简单有效的视觉域自适应方法
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00309
Jian Liang, R. He, Zhenan Sun, T. Tan
Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks.
传统的领域自适应方法通常采用深度神经网络或子空间学习来寻找跨领域的不变表示。然而,大多数深度学习方法高度依赖于大尺度的源域,并且训练的计算成本很高,而子空间学习方法总是具有二次型的时间复杂度,并且受到大尺度域的影响。本文提供了一种简单有效的解决方案,可作为领域自适应任务的一个性能良好的基线。我们的方法建立在最近的质心分类器上,寻找目标域的质心与源域的质心适度偏移的子空间。具体来说,我们设计了一个不访问源域数据的统一目标,并采用交替最小化方案迭代发现伪目标标签、不变子空间和目标质心。该算法除了具有隐私保护特性(远程监督)外,还具有可证明的收敛性和良好的线性时间复杂度。此外,该方法可以很容易地扩展到多源设置和领域泛化,并且通过借鉴学习到的可转移特征,显著增强了常用的深度自适应方法。在包括对象、数字和人脸识别数据集在内的几个基准上进行的广泛实验验证了我们的方法在各种领域适应任务中产生了最先进的结果。
{"title":"Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation","authors":"Jian Liang, R. He, Zhenan Sun, T. Tan","doi":"10.1109/CVPR.2019.00309","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00309","url":null,"abstract":"Conventional domain adaptation methods usually resort to deep neural networks or subspace learning to find invariant representations across domains. However, most deep learning methods highly rely on large-size source domains and are computationally expensive to train, while subspace learning methods always have a quadratic time complexity that suffers from the large domain size. This paper provides a simple and efficient solution, which could be regarded as a well-performing baseline for domain adaptation tasks. Our method is built upon the nearest centroid classifier, seeking a subspace where the centroids in the target domain are moderately shifted from those in the source domain. Specifically, we design a unified objective without accessing the source domain data and adopt an alternating minimization scheme to iteratively discover the pseudo target labels, invariant subspace, and target centroids. Besides its privacy-preserving property (distant supervision), the algorithm is provably convergent and has a promising linear time complexity. In addition, the proposed method can be readily extended to multi-source setting and domain generalization, and it remarkably enhances popular deep adaptation methods by borrowing the learned transferable features. Extensive experiments on several benchmarks including object, digit, and face recognition datasets validate that our methods yield state-of-the-art results in various domain adaptation tasks.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"2970-2979"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73073990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos 通过观看视频实现统一的无监督光流和立体深度估计
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00826
Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yezhou Yang, W. Xu
In this paper, we propose UnOS, an unified system for unsupervised optical flow and stereo depth estimation using convolutional neural network (CNN) by taking advantages of their inherent geometrical consistency based on the rigid-scene assumption. UnOS significantly outperforms other state-of-the-art (SOTA) unsupervised approaches that treated the two tasks independently. Specifically, given two consecutive stereo image pairs from a video, UnOS estimates per-pixel stereo depth images, camera ego-motion and optical flow with three parallel CNNs. Based on these quantities, UnOS computes rigid optical flow and compares it against the optical flow estimated from the FlowNet, yielding pixels satisfying the rigid-scene assumption. Then, we encourage geometrical consistency between the two estimated flows within rigid regions, from which we derive a rigid-aware direct visual odometry (RDVO) module. We also propose rigid and occlusion-aware flow-consistency losses for the learning of UnOS. We evaluated our results on the popular KITTI dataset over 4 related tasks, ie stereo depth, optical flow, visual odometry and motion segmentation.
本文基于刚性场景假设,利用卷积神经网络(CNN)固有的几何一致性,提出了一种基于卷积神经网络的无监督光流和立体深度估计的统一系统UnOS。UnOS显著优于其他独立处理这两个任务的最先进(SOTA)无监督方法。具体来说,给定来自一个视频的两个连续的立体图像对,UnOS用三个平行的cnn估计每像素立体深度图像、相机自我运动和光流。基于这些量,UnOS计算刚性光流,并将其与FlowNet估计的光流进行比较,产生满足刚性场景假设的像素。然后,我们鼓励在刚性区域内的两个估计流之间的几何一致性,从中我们得出一个刚性感知的直接视觉里程计(RDVO)模块。我们还提出了用于UnOS学习的刚性和闭塞感知流一致性损失。我们在流行的KITTI数据集上评估了4个相关任务的结果,即立体深度、光流、视觉里程计和运动分割。
{"title":"UnOS: Unified Unsupervised Optical-Flow and Stereo-Depth Estimation by Watching Videos","authors":"Yang Wang, Peng Wang, Zhenheng Yang, Chenxu Luo, Yezhou Yang, W. Xu","doi":"10.1109/CVPR.2019.00826","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00826","url":null,"abstract":"In this paper, we propose UnOS, an unified system for unsupervised optical flow and stereo depth estimation using convolutional neural network (CNN) by taking advantages of their inherent geometrical consistency based on the rigid-scene assumption. UnOS significantly outperforms other state-of-the-art (SOTA) unsupervised approaches that treated the two tasks independently. Specifically, given two consecutive stereo image pairs from a video, UnOS estimates per-pixel stereo depth images, camera ego-motion and optical flow with three parallel CNNs. Based on these quantities, UnOS computes rigid optical flow and compares it against the optical flow estimated from the FlowNet, yielding pixels satisfying the rigid-scene assumption. Then, we encourage geometrical consistency between the two estimated flows within rigid regions, from which we derive a rigid-aware direct visual odometry (RDVO) module. We also propose rigid and occlusion-aware flow-consistency losses for the learning of UnOS. We evaluated our results on the popular KITTI dataset over 4 related tasks, ie stereo depth, optical flow, visual odometry and motion segmentation.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"2 1","pages":"8063-8073"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74709738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks 局部到全局学习:逐渐增加训练深度神经网络的类
Pub Date : 2019-06-01 DOI: 10.1109/CVPR.2019.00488
Hao Cheng, Dongze Lian, Bowen Deng, Shenghua Gao, T. Tan, Yanlin Geng
We propose a new learning paradigm, Local to Global Learning (LGL), for Deep Neural Networks (DNNs) to improve the performance of classification problems. The core of LGL is to learn a DNN model from fewer categories (local) to more categories (global) gradually within the entire training set. LGL is most related to the Self-Paced Learning (SPL) algorithm but its formulation is different from SPL. SPL trains its data from simple to complex, while LGL from local to global. In this paper, we incorporate the idea of LGL into the learning objective of DNNs and explain why LGL works better from an information-theoretic perspective. Experiments on the toy data, CIFAR-10, CIFAR-100, and ImageNet dataset show that LGL outperforms the baseline and SPL-based algorithms.
我们提出了一种新的学习范式,局部到全局学习(LGL),用于深度神经网络(dnn)来提高分类问题的性能。LGL的核心是在整个训练集中,从更少的类别(局部)逐渐学习到更多的类别(全局)。LGL与自进度学习(self - pace Learning, SPL)算法关系最为密切,但其表述与自进度学习(self - pace Learning)算法不同。SPL从简单到复杂的数据训练,LGL从本地到全局的数据训练。在本文中,我们将LGL的思想融入到深度神经网络的学习目标中,并从信息论的角度解释了LGL为什么能更好地工作。在玩具数据、CIFAR-10、CIFAR-100和ImageNet数据集上的实验表明,LGL优于基线算法和基于pl的算法。
{"title":"Local to Global Learning: Gradually Adding Classes for Training Deep Neural Networks","authors":"Hao Cheng, Dongze Lian, Bowen Deng, Shenghua Gao, T. Tan, Yanlin Geng","doi":"10.1109/CVPR.2019.00488","DOIUrl":"https://doi.org/10.1109/CVPR.2019.00488","url":null,"abstract":"We propose a new learning paradigm, Local to Global Learning (LGL), for Deep Neural Networks (DNNs) to improve the performance of classification problems. The core of LGL is to learn a DNN model from fewer categories (local) to more categories (global) gradually within the entire training set. LGL is most related to the Self-Paced Learning (SPL) algorithm but its formulation is different from SPL. SPL trains its data from simple to complex, while LGL from local to global. In this paper, we incorporate the idea of LGL into the learning objective of DNNs and explain why LGL works better from an information-theoretic perspective. Experiments on the toy data, CIFAR-10, CIFAR-100, and ImageNet dataset show that LGL outperforms the baseline and SPL-based algorithms.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"947 1","pages":"4743-4751"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77578172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1