首页 > 最新文献

2018 15th Conference on Computer and Robot Vision (CRV)最新文献

英文 中文
Convolutional Neural Networks Regularized by Correlated Noise 相关噪声正则化卷积神经网络
Pub Date : 2018-04-03 DOI: 10.1109/CRV.2018.00059
Shamak Dutta, B. Tripp, Graham W. Taylor
Neurons in the visual cortex are correlated in their variability. The presence of correlation impacts cortical processing because noise cannot be averaged out over many neurons. In an effort to understand the functional purpose of correlated variability, we implement and evaluate correlated noise models in deep convolutional neural networks. Inspired by the cortex, correlation is defined as a function of the distance between neurons and their selectivity. We show how to sample from high-dimensional correlated distributions while keeping the procedure differentiable, so that back-propagation can proceed as usual. The impact of correlated variability is evaluated on the classification of occluded and non-occluded images with and without the presence of other regularization techniques, such as dropout. More work is needed to understand the effects of correlations in various conditions, however in 10/12 of the cases we studied, the best performance on occluded images was obtained from a model with correlated noise.
视觉皮层的神经元在它们的可变性上是相关的。相关性的存在影响皮质处理,因为噪声不能在许多神经元上平均。为了理解相关变异性的功能目的,我们在深度卷积神经网络中实现并评估了相关噪声模型。受大脑皮层的启发,相关性被定义为神经元之间的距离及其选择性的函数。我们展示了如何从高维相关分布中采样,同时保持过程的可微性,以便反向传播可以照常进行。在使用和不使用其他正则化技术(如dropout)的情况下,评估相关可变性对被遮挡和非遮挡图像分类的影响。需要做更多的工作来了解各种条件下相关性的影响,然而在我们研究的10/12的案例中,具有相关噪声的模型在遮挡图像上获得了最佳性能。
{"title":"Convolutional Neural Networks Regularized by Correlated Noise","authors":"Shamak Dutta, B. Tripp, Graham W. Taylor","doi":"10.1109/CRV.2018.00059","DOIUrl":"https://doi.org/10.1109/CRV.2018.00059","url":null,"abstract":"Neurons in the visual cortex are correlated in their variability. The presence of correlation impacts cortical processing because noise cannot be averaged out over many neurons. In an effort to understand the functional purpose of correlated variability, we implement and evaluate correlated noise models in deep convolutional neural networks. Inspired by the cortex, correlation is defined as a function of the distance between neurons and their selectivity. We show how to sample from high-dimensional correlated distributions while keeping the procedure differentiable, so that back-propagation can proceed as usual. The impact of correlated variability is evaluated on the classification of occluded and non-occluded images with and without the presence of other regularization techniques, such as dropout. More work is needed to understand the effects of correlations in various conditions, however in 10/12 of the cases we studied, the best performance on occluded images was obtained from a model with correlated noise.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134157418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Deep Learning Object Detection Methods for Ecological Camera Trap Data 生态相机陷阱数据的深度学习目标检测方法
Pub Date : 2018-03-28 DOI: 10.1109/CRV.2018.00052
Stefan Schneider, Graham W. Taylor, S. C. Kremer
Deep learning methods for computer vision tasks show promise for automating the data analysis of camera trap images. Ecological camera traps are a common approach for monitoring an ecosystem's animal population, as they provide continual insight into an environment without being intrusive. However, the analysis of camera trap images is expensive, labour intensive, and time consuming. Recent advances in the field of deep learning for object detection show promise towards automating the analysis of camera trap images. Here, we demonstrate their capabilities by training and comparing two deep learning object detection classifiers, Faster R-CNN and YOLO v2.0, to identify, quantify, and localize animal species within camera trap images using the Reconyx Camera Trap and the self-labeled Gold Standard Snapshot Serengeti data sets. When trained on large labeled datasets, object recognition methods have shown success. We demonstrate their use, in the context of realistically sized ecological data sets, by testing if object detection methods are applicable for ecological research scenarios when utilizing transfer learning. Faster R-CNN outperformed YOLO v2.0 with average accuracies of 93.0% and 76.7% on the two data sets, respectively. Our findings show promising steps towards the automation of the labourious task of labeling camera trap images, which can be used to improve our understanding of the population dynamics of ecosystems across the planet.
计算机视觉任务的深度学习方法有望实现相机陷阱图像的自动化数据分析。生态相机陷阱是监测生态系统动物种群的常用方法,因为它们可以在不侵入的情况下持续观察环境。然而,相机陷阱图像的分析是昂贵的,劳动密集,耗时。在目标检测的深度学习领域的最新进展显示了对相机陷阱图像的自动化分析的希望。在这里,我们通过训练和比较两种深度学习目标检测分类器,Faster R-CNN和YOLO v2.0,来展示他们的能力,使用Reconyx相机陷阱和自标记的黄金标准快照塞伦盖蒂数据集来识别、量化和定位相机陷阱图像中的动物物种。当在大型标记数据集上训练时,目标识别方法已经显示出成功。我们通过测试对象检测方法在使用迁移学习时是否适用于生态研究场景,在实际规模的生态数据集的背景下展示了它们的使用。更快的R-CNN在两个数据集上的平均准确率分别为93.0%和76.7%,优于YOLO v2.0。我们的研究结果显示,在标记相机陷阱图像这一艰巨任务的自动化方面迈出了有希望的一步,这可以用来提高我们对全球生态系统种群动态的理解。
{"title":"Deep Learning Object Detection Methods for Ecological Camera Trap Data","authors":"Stefan Schneider, Graham W. Taylor, S. C. Kremer","doi":"10.1109/CRV.2018.00052","DOIUrl":"https://doi.org/10.1109/CRV.2018.00052","url":null,"abstract":"Deep learning methods for computer vision tasks show promise for automating the data analysis of camera trap images. Ecological camera traps are a common approach for monitoring an ecosystem's animal population, as they provide continual insight into an environment without being intrusive. However, the analysis of camera trap images is expensive, labour intensive, and time consuming. Recent advances in the field of deep learning for object detection show promise towards automating the analysis of camera trap images. Here, we demonstrate their capabilities by training and comparing two deep learning object detection classifiers, Faster R-CNN and YOLO v2.0, to identify, quantify, and localize animal species within camera trap images using the Reconyx Camera Trap and the self-labeled Gold Standard Snapshot Serengeti data sets. When trained on large labeled datasets, object recognition methods have shown success. We demonstrate their use, in the context of realistically sized ecological data sets, by testing if object detection methods are applicable for ecological research scenarios when utilizing transfer learning. Faster R-CNN outperformed YOLO v2.0 with average accuracies of 93.0% and 76.7% on the two data sets, respectively. Our findings show promising steps towards the automation of the labourious task of labeling camera trap images, which can be used to improve our understanding of the population dynamics of ecosystems across the planet.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121148372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 128
Generalized Hadamard-Product Fusion Operators for Visual Question Answering 用于视觉问答的广义Hadamard-Product融合算子
Pub Date : 2018-03-26 DOI: 10.1109/CRV.2018.00016
Brendan Duke, Graham W. Taylor
We propose a generalized class of multimodal fusion operators for the task of visual question answering (VQA). We identify generalizations of existing multimodal fusion operators based on the Hadamard product, and show that specific non-trivial instantiations of this generalized fusion operator exhibit superior performance in terms of OpenEnded accuracy on the VQA task. In particular, we introduce Nonlinearity Ensembling, Feature Gating, and post-fusion neural network layers as fusion operator components, culminating in an absolute percentage point improvement of 1.1% on the VQA 2.0 test-dev set over baseline fusion operators, which use the same features as input. We use our findings as evidence that our generalized class of fusion operators could lead to the discovery of even superior task-specific operators when used as a search space in an architecture search over fusion operators.
针对视觉问答任务,提出了一类广义的多模态融合算子。我们基于Hadamard积确定了现有多模态融合算子的泛化,并表明该广义融合算子的具体非平凡实例在VQA任务的开放式精度方面表现出优越的性能。特别是,我们引入了非线性集成、特征门控和融合后神经网络层作为融合算子组件,最终在VQA 2.0测试开发集上比基线融合算子提高了1.1%的绝对百分比,后者使用相同的特征作为输入。我们使用我们的发现作为证据,证明我们的广义类融合算子可以在融合算子的架构搜索中用作搜索空间时,发现甚至更优越的任务特定算子。
{"title":"Generalized Hadamard-Product Fusion Operators for Visual Question Answering","authors":"Brendan Duke, Graham W. Taylor","doi":"10.1109/CRV.2018.00016","DOIUrl":"https://doi.org/10.1109/CRV.2018.00016","url":null,"abstract":"We propose a generalized class of multimodal fusion operators for the task of visual question answering (VQA). We identify generalizations of existing multimodal fusion operators based on the Hadamard product, and show that specific non-trivial instantiations of this generalized fusion operator exhibit superior performance in terms of OpenEnded accuracy on the VQA task. In particular, we introduce Nonlinearity Ensembling, Feature Gating, and post-fusion neural network layers as fusion operator components, culminating in an absolute percentage point improvement of 1.1% on the VQA 2.0 test-dev set over baseline fusion operators, which use the same features as input. We use our findings as evidence that our generalized class of fusion operators could lead to the discovery of even superior task-specific operators when used as a search space in an architecture search over fusion operators.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134154781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Real-Time End-to-End Action Detection with Two-Stream Networks 实时端到端动作检测与两流网络
Pub Date : 2018-02-23 DOI: 10.1109/CRV.2018.00015
Alaaeldin El-Nouby, Graham W. Taylor
Two-stream networks have been very successful for solving the problem of action detection. However, prior work using two-stream networks train both streams separately, which prevents the network from exploiting regularities between the two streams. Moreover, unlike the visual stream, the dominant forms of optical flow computation typically do not maximally exploit GPU parallelism. We present a real-time end-to-end trainable two-stream network for action detection. First, we integrate the optical flow computation in our framework by using Flownet2. Second, we apply early fusion for the two streams and train the whole pipeline jointly end-to-end. Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset. Our experimental results show that training the pipeline jointly end-to-end with fine-tuning the optical flow for the objective of action detection improves detection performance significantly. Additionally, we observe an improvement when initializing with parameters pre-trained using Kinetics. Last, we show that by integrating the optical flow computation, our framework is more efficient, running at real-time speeds (up to 31 fps).
双流网络已经非常成功地解决了动作检测问题。然而,先前使用双流网络的工作分别训练两个流,这阻止了网络利用两个流之间的规律。此外,与视觉流不同,光流计算的主要形式通常不能最大限度地利用GPU的并行性。我们提出了一个实时的端到端可训练的双流网络用于动作检测。首先,我们利用Flownet2将光流计算集成到我们的框架中。其次,对两流进行早期融合,对整个管道进行端到端联合训练。最后,为了更好地进行网络初始化,我们使用最近发布的大规模Kinetics数据集对我们的框架进行预训练,从动作识别任务转移到动作检测任务。实验结果表明,以动作检测为目标,对管道进行端到端联合训练并对光流进行微调可以显著提高检测性能。此外,我们观察到使用Kinetics预训练的参数初始化时的改进。最后,我们表明,通过集成光流计算,我们的框架更高效,运行在实时速度(高达31 fps)。
{"title":"Real-Time End-to-End Action Detection with Two-Stream Networks","authors":"Alaaeldin El-Nouby, Graham W. Taylor","doi":"10.1109/CRV.2018.00015","DOIUrl":"https://doi.org/10.1109/CRV.2018.00015","url":null,"abstract":"Two-stream networks have been very successful for solving the problem of action detection. However, prior work using two-stream networks train both streams separately, which prevents the network from exploiting regularities between the two streams. Moreover, unlike the visual stream, the dominant forms of optical flow computation typically do not maximally exploit GPU parallelism. We present a real-time end-to-end trainable two-stream network for action detection. First, we integrate the optical flow computation in our framework by using Flownet2. Second, we apply early fusion for the two streams and train the whole pipeline jointly end-to-end. Finally, for better network initialization, we transfer from the task of action recognition to action detection by pre-training our framework using the recently released large-scale Kinetics dataset. Our experimental results show that training the pipeline jointly end-to-end with fine-tuning the optical flow for the objective of action detection improves detection performance significantly. Additionally, we observe an improvement when initializing with parameters pre-trained using Kinetics. Last, we show that by integrating the optical flow computation, our framework is more efficient, running at real-time speeds (up to 31 fps).","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131984657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection 微型固态硬盘:用于实时嵌入式目标检测的微型单镜头检测深度卷积神经网络
Pub Date : 2018-02-19 DOI: 10.1109/CRV.2018.00023
A. Wong, M. Shafiee, Francis Li, Brendan Chwyl
Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the singleshot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire subnetwork stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios.
目标检测是计算机视觉中的一个主要挑战,涉及到场景中的目标分类和目标定位。虽然近年来深度神经网络已经显示出非常强大的技术来解决对象检测的挑战,但使这种对象检测网络能够在嵌入式设备上广泛部署的最大挑战之一是高计算和内存要求。最近,人们越来越关注于探索更适合嵌入式设备的小型深度神经网络架构,如Tiny YOLO和SqueezeDet。受SqueezeNet中引入的Fire微架构的效率和SSD中引入的单次检测宏架构的目标检测性能的启发,本文介绍了Tiny SSD,它是一种用于实时嵌入式目标检测的单次检测深度卷积神经网络,由高度优化的非均匀Fire子网堆栈和高度优化的基于ssd的辅助卷积特征层的非均匀子网堆栈,专门设计用于在保持目标检测性能的同时最小化模型尺寸。由此产生的Tiny固态硬盘具有2.3MB的模型大小(比Tiny YOLO小约26倍),但在VOC 2007上仍然实现61.3%的mAP(比Tiny YOLO高约4.2%)。这些实验结果表明,非常小的深度神经网络架构可以设计用于实时目标检测,非常适合嵌入式场景。
{"title":"Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection","authors":"A. Wong, M. Shafiee, Francis Li, Brendan Chwyl","doi":"10.1109/CRV.2018.00023","DOIUrl":"https://doi.org/10.1109/CRV.2018.00023","url":null,"abstract":"Object detection is a major challenge in computer vision, involving both object classification and object localization within a scene. While deep neural networks have been shown in recent years to yield very powerful techniques for tackling the challenge of object detection, one of the biggest challenges with enabling such object detection networks for widespread deployment on embedded devices is high computational and memory requirements. Recently, there has been an increasing focus in exploring small deep neural network architectures for object detection that are more suitable for embedded devices, such as Tiny YOLO and SqueezeDet. Inspired by the efficiency of the Fire microarchitecture introduced in SqueezeNet and the object detection performance of the singleshot detection macroarchitecture introduced in SSD, this paper introduces Tiny SSD, a single-shot detection deep convolutional neural network for real-time embedded object detection that is composed of a highly optimized, non-uniform Fire subnetwork stack and a non-uniform sub-network stack of highly optimized SSD-based auxiliary convolutional feature layers designed specifically to minimize model size while maintaining object detection performance. The resulting Tiny SSD possess a model size of 2.3MB (~26X smaller than Tiny YOLO) while still achieving an mAP of 61.3% on VOC 2007 (~4.2% higher than Tiny YOLO). These experimental results show that very small deep neural network architectures can be designed for real-time object detection that are well-suited for embedded scenarios.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122362424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Nature vs. Nurture: The Role of Environmental Resources in Evolutionary Deep Intelligence 先天vs.后天:环境资源在进化深度智能中的作用
Pub Date : 2018-02-09 DOI: 10.1109/CRV.2018.00058
A. Chung, P. Fieguth, A. Wong
Evolutionary deep intelligence synthesizes highly efficient deep neural network architectures over successive generations. Inspired by the nature versus nurture debate, we propose a study to examine the role of external factors on the network synthesis process by varying the availability of simulated environmental resources. Experimental results were obtained for networks synthesized via asexual evolutionary synthesis (1-parent) and sexual evolutionary synthesis (2-parent, 3-parent, and 5-parent) using a 10% subset of the MNIST dataset. Results show that a lower environmental factor model resulted in a more gradual loss in performance accuracy and decrease in storage size. This potentially allows significantly reduced storage size with minimal to no drop in performance accuracy, and the best networks were synthesized using the lowest environmental factor models.
进化深度智能在连续几代中综合了高效的深度神经网络架构。受先天与后天辩论的启发,我们提出了一项研究,通过改变模拟环境资源的可用性来检查外部因素在网络合成过程中的作用。使用MNIST数据集的10%子集,获得了通过无性进化合成(单亲本)和有性进化合成(双亲、3亲本和5亲本)合成的网络的实验结果。结果表明,环境因子越小,性能精度损失越小,存储空间越小。这可能会大大减少存储大小,而性能准确性几乎没有下降,并且使用最低的环境因素模型合成最佳网络。
{"title":"Nature vs. Nurture: The Role of Environmental Resources in Evolutionary Deep Intelligence","authors":"A. Chung, P. Fieguth, A. Wong","doi":"10.1109/CRV.2018.00058","DOIUrl":"https://doi.org/10.1109/CRV.2018.00058","url":null,"abstract":"Evolutionary deep intelligence synthesizes highly efficient deep neural network architectures over successive generations. Inspired by the nature versus nurture debate, we propose a study to examine the role of external factors on the network synthesis process by varying the availability of simulated environmental resources. Experimental results were obtained for networks synthesized via asexual evolutionary synthesis (1-parent) and sexual evolutionary synthesis (2-parent, 3-parent, and 5-parent) using a 10% subset of the MNIST dataset. Results show that a lower environmental factor model resulted in a more gradual loss in performance accuracy and decrease in storage size. This potentially allows significantly reduced storage size with minimal to no drop in performance accuracy, and the best networks were synthesized using the lowest environmental factor models.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126515130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In Defense of Classical Image Processing: Fast Depth Completion on the CPU 经典图像处理的防御:CPU上的快速深度完成
Pub Date : 2018-01-31 DOI: 10.1109/CRV.2018.00013
Jason Ku, Ali Harakeh, Steven L. Waslander
With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from handcrafted classical image processing algorithms. This paper shows that with a well designed algorithm, we are capable of outperforming neural network based methods on the task of depth completion. The proposed algorithm is simple and fast, runs on the CPU, and relies only on basic image processing operations to perform depth completion of sparse LIDAR depth data. We evaluate our algorithm on the challenging KITTI depth completion benchmark, and at the time of submission, our method ranks first on the KITTI test server among all published methods. Furthermore, our algorithm is data independent, requiring no training data to perform the task at hand. The code written in Python is publicly available at https://github.com/kujason/ip_basic
随着数据驱动的深度神经网络作为通用函数逼近器的实现的兴起,大多数计算机视觉问题的研究已经脱离了手工制作的经典图像处理算法。本文表明,通过设计良好的算法,我们能够在深度完井任务上优于基于神经网络的方法。该算法简单、快速,运行在CPU上,仅依靠基本的图像处理操作对稀疏LIDAR深度数据进行深度补全。我们在具有挑战性的KITTI深度完井基准上对我们的算法进行了评估,在提交时,我们的方法在KITTI测试服务器上排名第一。此外,我们的算法是数据独立的,不需要训练数据来执行手头的任务。用Python编写的代码可以在https://github.com/kujason/ip_basic上公开获得
{"title":"In Defense of Classical Image Processing: Fast Depth Completion on the CPU","authors":"Jason Ku, Ali Harakeh, Steven L. Waslander","doi":"10.1109/CRV.2018.00013","DOIUrl":"https://doi.org/10.1109/CRV.2018.00013","url":null,"abstract":"With the rise of data driven deep neural networks as a realization of universal function approximators, most research on computer vision problems has moved away from handcrafted classical image processing algorithms. This paper shows that with a well designed algorithm, we are capable of outperforming neural network based methods on the task of depth completion. The proposed algorithm is simple and fast, runs on the CPU, and relies only on basic image processing operations to perform depth completion of sparse LIDAR depth data. We evaluate our algorithm on the challenging KITTI depth completion benchmark, and at the time of submission, our method ranks first on the KITTI test server among all published methods. Furthermore, our algorithm is data independent, requiring no training data to perform the task at hand. The code written in Python is publicly available at https://github.com/kujason/ip_basic","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121340911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 197
Learning a Bias Correction for Lidar-Only Motion Estimation 学习仅激光雷达运动估计的偏差校正
Pub Date : 2018-01-15 DOI: 10.1109/CRV.2018.00032
T. Y. Tang, David J. Yoon, F. Pomerleau, T. Barfoot
This paper presents a novel technique to correct for bias in a classical estimator using a learning approach. We apply a learned bias correction to a lidar-only motion estimation pipeline. Our technique trains a Gaussian process (GP) regression model using data with ground truth. The inputs to the model are high-level features derived from the geometry of the point-clouds, and the outputs are the predicted biases between poses computed by the estimator and the ground truth. The predicted biases are applied as a correction to the poses computed by the estimator. Our technique is evaluated on over 50km of lidar data, which includes the KITTI odometry benchmark and lidar datasets collected around the University of Toronto campus. After applying the learned bias correction, we obtained significant improvements to lidar odometry in all datasets tested. We achieved around 10% reduction in errors on all datasets from an already accurate lidar odometry algorithm, at the expense of only less than 1% increase in computational cost at run-time.
本文提出了一种利用学习方法修正经典估计器偏差的新方法。我们将学习偏差校正应用于仅激光雷达的运动估计管道。我们的技术训练高斯过程(GP)回归模型使用的数据与基础真理。模型的输入是源自点云几何形状的高级特征,输出是由估计器计算的姿态与地面真实值之间的预测偏差。将预测偏差作为对估计器计算的姿态的修正。我们的技术在超过50公里的激光雷达数据上进行了评估,其中包括KITTI里程计基准和在多伦多大学校园周围收集的激光雷达数据集。应用学习偏差校正后,我们在所有测试数据集中获得了激光雷达里程测量的显着改进。通过精确的激光雷达里程计算法,我们将所有数据集的误差降低了10%左右,而运行时的计算成本仅增加了不到1%。
{"title":"Learning a Bias Correction for Lidar-Only Motion Estimation","authors":"T. Y. Tang, David J. Yoon, F. Pomerleau, T. Barfoot","doi":"10.1109/CRV.2018.00032","DOIUrl":"https://doi.org/10.1109/CRV.2018.00032","url":null,"abstract":"This paper presents a novel technique to correct for bias in a classical estimator using a learning approach. We apply a learned bias correction to a lidar-only motion estimation pipeline. Our technique trains a Gaussian process (GP) regression model using data with ground truth. The inputs to the model are high-level features derived from the geometry of the point-clouds, and the outputs are the predicted biases between poses computed by the estimator and the ground truth. The predicted biases are applied as a correction to the poses computed by the estimator. Our technique is evaluated on over 50km of lidar data, which includes the KITTI odometry benchmark and lidar datasets collected around the University of Toronto campus. After applying the learned bias correction, we obtained significant improvements to lidar odometry in all datasets tested. We achieved around 10% reduction in errors on all datasets from an already accurate lidar odometry algorithm, at the expense of only less than 1% increase in computational cost at run-time.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129866855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Real-Time Deep Hair Matting on Mobile Devices 移动设备上的实时深层毛发铺垫
Pub Date : 2017-12-19 DOI: 10.1109/CRV.2018.00011
Alex Levinshtein, Cheng Chang, Edmund Phung, I. Kezele, W. Guo, P. Aarabi
Augmented reality is an emerging technology in many application domains. Among them is the beauty industry, where live virtual try-on of beauty products is of great importance. In this paper, we address the problem of live hair color augmentation. To achieve this goal, hair needs to be segmented quickly and accurately. We show how a modified MobileNet CNN architecture can be used to segment the hair in real-time. Instead of training this network using large amounts of accurate segmentation data, which is difficult to obtain, we use crowd sourced hair segmentation data. While such data is much simpler to obtain, the segmentations there are noisy and coarse. Despite this, we show how our system can produce accurate and fine-detailed hair mattes, while running at over 30 fps on an iPad Pro tablet.
增强现实技术在许多应用领域都是一项新兴技术。其中包括美容行业,美容产品的实时虚拟试戴非常重要。在本文中,我们解决的问题,活的头发颜色增加。为了实现这一目标,需要快速准确地分割头发。我们展示了一个改进的MobileNet CNN架构如何用于实时分割头发。我们没有使用大量难以获得的精确分割数据来训练这个网络,而是使用了众包的头发分割数据。虽然这样的数据更容易获得,但那里的分割是有噪声和粗糙的。尽管如此,我们展示了我们的系统如何在iPad Pro平板电脑上以超过30 fps的速度运行时产生准确而细致的头发哑光。
{"title":"Real-Time Deep Hair Matting on Mobile Devices","authors":"Alex Levinshtein, Cheng Chang, Edmund Phung, I. Kezele, W. Guo, P. Aarabi","doi":"10.1109/CRV.2018.00011","DOIUrl":"https://doi.org/10.1109/CRV.2018.00011","url":null,"abstract":"Augmented reality is an emerging technology in many application domains. Among them is the beauty industry, where live virtual try-on of beauty products is of great importance. In this paper, we address the problem of live hair color augmentation. To achieve this goal, hair needs to be segmented quickly and accurately. We show how a modified MobileNet CNN architecture can be used to segment the hair in real-time. Instead of training this network using large amounts of accurate segmentation data, which is difficult to obtain, we use crowd sourced hair segmentation data. While such data is much simpler to obtain, the segmentations there are noisy and coarse. Despite this, we show how our system can produce accurate and fine-detailed hair mattes, while running at over 30 fps on an iPad Pro tablet.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130499686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
WAYLA - Generating Images from Eye Movements WAYLA——从眼球运动中生成图像
Pub Date : 2017-11-21 DOI: 10.1109/CRV.2018.00026
Bingqing Yu, James J. Clark
We present a method for reconstructing images viewed by observers based only on their eye movements. By exploring the relationships between gaze patterns and image stimuli, the What Are You Looking At?" (WAYLA) system has the goal of synthesizing photo-realistic images that are similar to the original pictures being viewed. The WAYLA approach is based on the Conditional Generative Adversarial Network (Conditional GAN) image-to-image translation technique of Isola et al. We consider two specific applications - the first of reconstructing newspaper images from gaze heat maps and the second of detailed reconstruction of images containing only text. The newspaper image reconstruction process is divided into two image-to-image translation operations the first mapping gaze heat maps into image segmentations and the second mapping the generated segmentation into a newspaper image. We validate the performance of our approach using various evaluation metrics along with human visual inspection. All results confirm the ability of our network to perform image generation tasks using eye tracking data
我们提出了一种仅根据观察者的眼球运动来重建他们所看到的图像的方法。通过探索凝视模式和图像刺激之间的关系,“你在看什么?”(WAYLA)系统的目标是合成与原始图片相似的逼真图像。WAYLA方法基于Isola等人的条件生成对抗网络(Conditional Generative Adversarial Network,条件GAN)图像到图像转换技术。我们考虑了两个具体的应用-第一个是从凝视热图重建报纸图像,第二个是仅包含文本的图像的详细重建。报纸图像重建过程分为两个图像到图像的转换操作,第一个是将凝视热图映射到图像分割中,第二个是将生成的分割映射到报纸图像中。我们使用各种评估指标以及人类视觉检查来验证我们方法的性能。所有结果都证实了我们的网络使用眼动追踪数据执行图像生成任务的能力
{"title":"WAYLA - Generating Images from Eye Movements","authors":"Bingqing Yu, James J. Clark","doi":"10.1109/CRV.2018.00026","DOIUrl":"https://doi.org/10.1109/CRV.2018.00026","url":null,"abstract":"We present a method for reconstructing images viewed by observers based only on their eye movements. By exploring the relationships between gaze patterns and image stimuli, the What Are You Looking At?\" (WAYLA) system has the goal of synthesizing photo-realistic images that are similar to the original pictures being viewed. The WAYLA approach is based on the Conditional Generative Adversarial Network (Conditional GAN) image-to-image translation technique of Isola et al. We consider two specific applications - the first of reconstructing newspaper images from gaze heat maps and the second of detailed reconstruction of images containing only text. The newspaper image reconstruction process is divided into two image-to-image translation operations the first mapping gaze heat maps into image segmentations and the second mapping the generated segmentation into a newspaper image. We validate the performance of our approach using various evaluation metrics along with human visual inspection. All results confirm the ability of our network to perform image generation tasks using eye tracking data","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131428190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2018 15th Conference on Computer and Robot Vision (CRV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1