首页 > 最新文献

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Efficient 3D Room Shape Recovery from a Single Panorama 有效的3D房间形状恢复从一个单一的全景
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.585
Hao Yang, Hui Zhang
We propose a method to recover the shape of a 3D room from a full-view indoor panorama. Our algorithm can automatically infer a 3D shape from a collection of partially oriented superpixel facets and line segments. The core part of the algorithm is a constraint graph, which includes lines and superpixels as vertices, and encodes their geometric relations as edges. A novel approach is proposed to perform 3D reconstruction based on the constraint graph by solving all the geometric constraints as constrained linear least-squares. The selected constraints used for reconstruction are identified using an occlusion detection method with a Markov random field. Experiments show that our method can recover room shapes that can not be addressed by previous approaches. Our method is also efficient, that is, the inference time for each panorama is less than 1 minute.
我们提出了一种从全视图室内全景中恢复3D房间形状的方法。我们的算法可以从部分定向的超像素切面和线段的集合中自动推断出3D形状。该算法的核心部分是约束图,将直线和超像素作为顶点,并将它们的几何关系编码为边。提出了一种基于约束图的三维重建方法,将所有几何约束求解为约束线性最小二乘。利用马尔可夫随机场的遮挡检测方法识别用于重建的选定约束。实验表明,我们的方法可以恢复以前的方法无法处理的房间形状。我们的方法也是高效的,即每个全景图的推理时间小于1分钟。
{"title":"Efficient 3D Room Shape Recovery from a Single Panorama","authors":"Hao Yang, Hui Zhang","doi":"10.1109/CVPR.2016.585","DOIUrl":"https://doi.org/10.1109/CVPR.2016.585","url":null,"abstract":"We propose a method to recover the shape of a 3D room from a full-view indoor panorama. Our algorithm can automatically infer a 3D shape from a collection of partially oriented superpixel facets and line segments. The core part of the algorithm is a constraint graph, which includes lines and superpixels as vertices, and encodes their geometric relations as edges. A novel approach is proposed to perform 3D reconstruction based on the constraint graph by solving all the geometric constraints as constrained linear least-squares. The selected constraints used for reconstruction are identified using an occlusion detection method with a Markov random field. Experiments show that our method can recover room shapes that can not be addressed by previous approaches. Our method is also efficient, that is, the inference time for each panorama is less than 1 minute.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"12 1","pages":"5422-5430"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75344881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Split and Match: Example-Based Adaptive Patch Sampling for Unsupervised Style Transfer 分割和匹配:基于示例的自适应补丁采样用于无监督风格迁移
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.66
Oriel Frigo, Neus Sabater, J. Delon, P. Hellier
This paper presents a novel unsupervised method to transfer the style of an example image to a source image. The complex notion of image style is here considered as a local texture transfer, eventually coupled with a global color transfer. For the local texture transfer, we propose a new method based on an adaptive patch partition that captures the style of the example image and preserves the structure of the source image. More precisely, this example-based partition predicts how well a source patch matches an example patch. Results on various images show that our method outperforms the most recent techniques.
本文提出了一种新的无监督方法将样例图像的风格转移到源图像。图像风格的复杂概念在这里被认为是局部纹理转移,最终与全局颜色转移相结合。对于局部纹理转移,我们提出了一种基于自适应补丁分割的方法,该方法在保留源图像结构的同时捕获样例图像的风格。更准确地说,这个基于示例的分区预测源补丁与示例补丁的匹配程度。在各种图像上的结果表明,我们的方法优于最新的技术。
{"title":"Split and Match: Example-Based Adaptive Patch Sampling for Unsupervised Style Transfer","authors":"Oriel Frigo, Neus Sabater, J. Delon, P. Hellier","doi":"10.1109/CVPR.2016.66","DOIUrl":"https://doi.org/10.1109/CVPR.2016.66","url":null,"abstract":"This paper presents a novel unsupervised method to transfer the style of an example image to a source image. The complex notion of image style is here considered as a local texture transfer, eventually coupled with a global color transfer. For the local texture transfer, we propose a new method based on an adaptive patch partition that captures the style of the example image and preserves the structure of the source image. More precisely, this example-based partition predicts how well a source patch matches an example patch. Results on various images show that our method outperforms the most recent techniques.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"30 1","pages":"553-561"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75429578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 118
Recovering Transparent Shape from Time-of-Flight Distortion 从飞行时间失真中恢复透明形状
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.475
Kenichiro Tanaka, Y. Mukaigawa, Hiroyuki Kubo, Y. Matsushita, Y. Yagi
This paper presents a method for recovering shape and normal of a transparent object from a single viewpoint using a Time-of-Flight (ToF) camera. Our method is built upon the fact that the speed of light varies with the refractive index of the medium and therefore the depth measurement of a transparent object with a ToF camera may be distorted. We show that, from this ToF distortion, the refractive light path can be uniquely determined by estimating a single parameter. We estimate this parameter by introducing a surface normal consistency between the one determined by a light path candidate and the other computed from the corresponding shape. The proposed method is evaluated by both simulation and real-world experiments and shows faithful transparent shape recovery.
本文提出了一种利用ToF (Time-of-Flight)相机从单视点恢复透明物体形状和法线的方法。我们的方法是建立在这样一个事实,即光速随介质的折射率而变化,因此用ToF相机测量透明物体的深度可能会失真。我们证明,从这种ToF畸变中,可以通过估计单个参数来唯一地确定折射光路。我们通过引入由光路候选者确定的表面法线一致性和由相应形状计算的表面法线一致性来估计该参数。通过仿真和实际实验对该方法进行了验证,结果表明该方法具有较好的透明形状恢复效果。
{"title":"Recovering Transparent Shape from Time-of-Flight Distortion","authors":"Kenichiro Tanaka, Y. Mukaigawa, Hiroyuki Kubo, Y. Matsushita, Y. Yagi","doi":"10.1109/CVPR.2016.475","DOIUrl":"https://doi.org/10.1109/CVPR.2016.475","url":null,"abstract":"This paper presents a method for recovering shape and normal of a transparent object from a single viewpoint using a Time-of-Flight (ToF) camera. Our method is built upon the fact that the speed of light varies with the refractive index of the medium and therefore the depth measurement of a transparent object with a ToF camera may be distorted. We show that, from this ToF distortion, the refractive light path can be uniquely determined by estimating a single parameter. We estimate this parameter by introducing a surface normal consistency between the one determined by a light path candidate and the other computed from the corresponding shape. The proposed method is evaluated by both simulation and real-world experiments and shows faithful transparent shape recovery.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"25 1","pages":"4387-4395"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74334763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
A Probabilistic Collaborative Representation Based Approach for Pattern Classification 基于概率协同表示的模式分类方法
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.322
Sijia Cai, Lei Zhang, W. Zuo, Xiangchu Feng
Conventional representation based classifiers, ranging from the classical nearest neighbor classifier and nearest subspace classifier to the recently developed sparse representation based classifier (SRC) and collaborative representation based classifier (CRC), are essentially distance based classifiers. Though SRC and CRC have shown interesting classification results, their intrinsic classification mechanism remains unclear. In this paper we propose a probabilistic collaborative representation framework, where the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and computed. Consequently, we present a probabilistic collaborative representation based classifier (ProCRC), which jointly maximizes the likelihood that a test sample belongs to each of the multiple classes. The final classification is performed by checking which class has the maximum likelihood. The proposed ProCRC has a clear probabilistic interpretation, and it shows superior performance to many popular classifiers, including SRC, CRC and SVM. Coupled with the CNN features, it also leads to state-of-the-art classification results on a variety of challenging visual datasets.
传统的基于表示的分类器,从经典的最近邻分类器和最近子空间分类器到最近发展的基于稀疏表示的分类器(SRC)和基于协作表示的分类器(CRC),本质上都是基于距离的分类器。尽管SRC和CRC的分类结果很有趣,但其内在的分类机制尚不清楚。本文提出了一个概率协同表示框架,该框架可以很好地定义和计算一个测试样本属于所有类的协同子空间的概率。因此,我们提出了一种基于概率协同表示的分类器(ProCRC),它共同最大化测试样本属于多个类别中的每一个的可能性。最后的分类是通过检查哪个类具有最大的似然来执行的。所提出的ProCRC具有清晰的概率解释,其性能优于SRC、CRC和SVM等常用分类器。再加上CNN的特征,它还可以在各种具有挑战性的视觉数据集上产生最先进的分类结果。
{"title":"A Probabilistic Collaborative Representation Based Approach for Pattern Classification","authors":"Sijia Cai, Lei Zhang, W. Zuo, Xiangchu Feng","doi":"10.1109/CVPR.2016.322","DOIUrl":"https://doi.org/10.1109/CVPR.2016.322","url":null,"abstract":"Conventional representation based classifiers, ranging from the classical nearest neighbor classifier and nearest subspace classifier to the recently developed sparse representation based classifier (SRC) and collaborative representation based classifier (CRC), are essentially distance based classifiers. Though SRC and CRC have shown interesting classification results, their intrinsic classification mechanism remains unclear. In this paper we propose a probabilistic collaborative representation framework, where the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and computed. Consequently, we present a probabilistic collaborative representation based classifier (ProCRC), which jointly maximizes the likelihood that a test sample belongs to each of the multiple classes. The final classification is performed by checking which class has the maximum likelihood. The proposed ProCRC has a clear probabilistic interpretation, and it shows superior performance to many popular classifiers, including SRC, CRC and SVM. Coupled with the CNN features, it also leads to state-of-the-art classification results on a variety of challenging visual datasets.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"45 1","pages":"2950-2959"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74355017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 262
Image Style Transfer Using Convolutional Neural Networks 使用卷积神经网络的图像风格传输
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.265
Leon A. Gatys, Alexander S. Ecker, M. Bethge
Rendering the semantic content of an image in different styles is a difficult image processing task. Arguably, a major limiting factor for previous approaches has been the lack of image representations that explicitly represent semantic information and, thus, allow to separate image content from style. Here we use image representations derived from Convolutional Neural Networks optimised for object recognition, which make high level image information explicit. We introduce A Neural Algorithm of Artistic Style that can separate and recombine the image content and style of natural images. The algorithm allows us to produce new images of high perceptual quality that combine the content of an arbitrary photograph with the appearance of numerous wellknown artworks. Our results provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.
以不同风格呈现图像的语义内容是一项困难的图像处理任务。可以说,以前方法的一个主要限制因素是缺乏显式表示语义信息的图像表示,从而允许将图像内容与样式分开。在这里,我们使用来自卷积神经网络的图像表示,卷积神经网络针对对象识别进行了优化,这使得高级图像信息显式。介绍了一种艺术风格的神经算法,该算法可以对自然图像的图像内容和风格进行分离和重组。该算法允许我们生成高感知质量的新图像,将任意照片的内容与众多知名艺术品的外观相结合。我们的研究结果为卷积神经网络学习的深度图像表示提供了新的见解,并展示了它们在高级图像合成和操作方面的潜力。
{"title":"Image Style Transfer Using Convolutional Neural Networks","authors":"Leon A. Gatys, Alexander S. Ecker, M. Bethge","doi":"10.1109/CVPR.2016.265","DOIUrl":"https://doi.org/10.1109/CVPR.2016.265","url":null,"abstract":"Rendering the semantic content of an image in different styles is a difficult image processing task. Arguably, a major limiting factor for previous approaches has been the lack of image representations that explicitly represent semantic information and, thus, allow to separate image content from style. Here we use image representations derived from Convolutional Neural Networks optimised for object recognition, which make high level image information explicit. We introduce A Neural Algorithm of Artistic Style that can separate and recombine the image content and style of natural images. The algorithm allows us to produce new images of high perceptual quality that combine the content of an arbitrary photograph with the appearance of numerous wellknown artworks. Our results provide new insights into the deep image representations learned by Convolutional Neural Networks and demonstrate their potential for high level image synthesis and manipulation.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"16 1","pages":"2414-2423"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74728119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4260
A Key Volume Mining Deep Framework for Action Recognition 一种用于动作识别的关键体挖掘深度框架
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.219
Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, Y. Qiao
Recently, deep learning approaches have demonstrated remarkable progresses for action recognition in videos. Most existing deep frameworks equally treat every volume i.e. spatial-temporal video clip, and directly assign a video label to all volumes sampled from it. However, within a video, discriminative actions may occur sparsely in a few key volumes, and most other volumes are irrelevant to the labeled action category. Training with a large proportion of irrelevant volumes will hurt performance. To address this issue, we propose a key volume mining deep framework to identify key volumes and conduct classification simultaneously. Specifically, our framework is trained is optimized in an alternative way integrated to the forward and backward stages of Stochastic Gradient Descent (SGD). In the forward pass, our network mines key volumes for each action class. In the backward pass, it updates network parameters with the help of these mined key volumes. In addition, we propose "Stochastic out" to model key volumes from multi-modalities, and an effective yet simple "unsupervised key volume proposal" method for high quality volume sampling. Our experiments show that action recognition performance can be significantly improved by mining key volumes, and we achieve state-of-the-art performance on HMDB51 and UCF101 (93.1%).
最近,深度学习方法在视频动作识别方面取得了显著进展。大多数现有的深度框架平等地对待每个卷,即时空视频剪辑,并直接为从中采样的所有卷分配视频标签。然而,在视频中,判别行为可能会稀疏地出现在几个关键卷中,而大多数其他卷与标记的动作类别无关。使用大量不相关的数据量进行训练将会影响成绩。为了解决这个问题,我们提出了一个密钥卷挖掘深度框架来识别密钥卷并同时进行分类。具体来说,我们的框架以一种与随机梯度下降(SGD)的前向和后向阶段相结合的替代方式进行训练和优化。在向前传递中,我们的网络为每个动作类挖掘关键卷。在反向传递中,它借助这些挖掘的密钥卷更新网络参数。此外,我们提出了一种基于多模态的“随机输出”方法来模拟关键卷,并提出了一种有效而简单的“无监督关键卷建议”方法来进行高质量的体积采样。我们的实验表明,通过挖掘密钥量可以显著提高动作识别性能,并且我们在HMDB51和UCF101上达到了最先进的性能(93.1%)。
{"title":"A Key Volume Mining Deep Framework for Action Recognition","authors":"Wangjiang Zhu, Jie Hu, Gang Sun, Xudong Cao, Y. Qiao","doi":"10.1109/CVPR.2016.219","DOIUrl":"https://doi.org/10.1109/CVPR.2016.219","url":null,"abstract":"Recently, deep learning approaches have demonstrated remarkable progresses for action recognition in videos. Most existing deep frameworks equally treat every volume i.e. spatial-temporal video clip, and directly assign a video label to all volumes sampled from it. However, within a video, discriminative actions may occur sparsely in a few key volumes, and most other volumes are irrelevant to the labeled action category. Training with a large proportion of irrelevant volumes will hurt performance. To address this issue, we propose a key volume mining deep framework to identify key volumes and conduct classification simultaneously. Specifically, our framework is trained is optimized in an alternative way integrated to the forward and backward stages of Stochastic Gradient Descent (SGD). In the forward pass, our network mines key volumes for each action class. In the backward pass, it updates network parameters with the help of these mined key volumes. In addition, we propose \"Stochastic out\" to model key volumes from multi-modalities, and an effective yet simple \"unsupervised key volume proposal\" method for high quality volume sampling. Our experiments show that action recognition performance can be significantly improved by mining key volumes, and we achieve state-of-the-art performance on HMDB51 and UCF101 (93.1%).","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"34 1","pages":"1991-1999"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79152167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 230
Progressively Parsing Interactional Objects for Fine Grained Action Detection 逐步解析交互对象以实现细粒度动作检测
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.116
Bingbing Ni, Xiaokang Yang, Shenghua Gao
Fine grained video action analysis often requires reliable detection and tracking of various interacting objects and human body parts, denoted as Interactional Object Parsing. However, most of the previous methods based on either independent or joint object detection might suffer from high model complexity and challenging image content, e.g., illumination/pose/appearance/scale variation, motion, and occlusion etc. In this work, we propose an end-to-end system based on recurrent neural network to perform frame by frame interactional object parsing, which can alleviate the difficulty through an incremental/progressive manner. Our key innovation is that: instead of jointly outputting all object detections at once, for each frame we use a set of long-short term memory (LSTM) nodes to incrementally refine the detections. After passing through each LSTM node, more object detections are consolidated and thus more contextual information could be utilized to localize more difficult objects. The object parsing results are further utilized to form object specific action representation for fine grained action detection. Extensive experiments on two benchmark fine grained activity datasets demonstrate that our proposed algorithm achieves better interacting object detection performance, which in turn boosts the action recognition performance over the state-of-the-art.
细粒度视频动作分析通常需要对各种交互对象和人体部位进行可靠的检测和跟踪,称为交互对象解析。然而,之前的大多数基于独立或联合目标检测的方法可能会受到模型复杂性高和具有挑战性的图像内容的影响,例如照明/姿势/外观/比例变化,运动和遮挡等。在这项工作中,我们提出了一个基于递归神经网络的端到端系统来执行逐帧交互对象解析,可以通过增量/渐进的方式减轻困难。我们的关键创新在于:我们使用一组长短期记忆(LSTM)节点来逐步改进检测,而不是一次联合输出所有目标检测。在通过每个LSTM节点后,更多的对象检测被整合,从而可以利用更多的上下文信息来定位更困难的对象。进一步利用对象解析结果形成对象特定的动作表示,用于细粒度动作检测。在两个基准细粒度活动数据集上的大量实验表明,我们提出的算法实现了更好的交互目标检测性能,从而提高了最先进的动作识别性能。
{"title":"Progressively Parsing Interactional Objects for Fine Grained Action Detection","authors":"Bingbing Ni, Xiaokang Yang, Shenghua Gao","doi":"10.1109/CVPR.2016.116","DOIUrl":"https://doi.org/10.1109/CVPR.2016.116","url":null,"abstract":"Fine grained video action analysis often requires reliable detection and tracking of various interacting objects and human body parts, denoted as Interactional Object Parsing. However, most of the previous methods based on either independent or joint object detection might suffer from high model complexity and challenging image content, e.g., illumination/pose/appearance/scale variation, motion, and occlusion etc. In this work, we propose an end-to-end system based on recurrent neural network to perform frame by frame interactional object parsing, which can alleviate the difficulty through an incremental/progressive manner. Our key innovation is that: instead of jointly outputting all object detections at once, for each frame we use a set of long-short term memory (LSTM) nodes to incrementally refine the detections. After passing through each LSTM node, more object detections are consolidated and thus more contextual information could be utilized to localize more difficult objects. The object parsing results are further utilized to form object specific action representation for fine grained action detection. Extensive experiments on two benchmark fine grained activity datasets demonstrate that our proposed algorithm achieves better interacting object detection performance, which in turn boosts the action recognition performance over the state-of-the-art.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"71 1","pages":"1020-1028"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73611077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
FANNG: Fast Approximate Nearest Neighbour Graphs 快速近似近邻图
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.616
Ben Harwood, T. Drummond
We present a new method for approximate nearest neighbour search on large datasets of high dimensional feature vectors, such as SIFT or GIST descriptors. Our approach constructs a directed graph that can be efficiently explored for nearest neighbour queries. Each vertex in this graph represents a feature vector from the dataset being searched. The directed edges are computed by exploiting the fact that, for these datasets, the intrinsic dimensionality of the local manifold-like structure formed by the elements of the dataset is significantly lower than the embedding space. We also provide an efficient search algorithm that uses this graph to rapidly find the nearest neighbour to a query with high probability. We show how the method can be adapted to give a strong guarantee of 100% recall where the query is within a threshold distance of its nearest neighbour. We demonstrate that our method is significantly more efficient than existing state of the art methods. In particular, our GPU implementation can deliver 90% recall for queries on a data set of 1 million SIFT descriptors at a rate of over 1.2 million queries per second on a Titan X. Finally we also demonstrate how our method scales to datasets of 5M and 20M entries.
提出了一种基于SIFT或GIST描述符的高维特征向量大数据集的近似近邻搜索方法。我们的方法构建了一个有向图,可以有效地探索最近邻查询。图中的每个顶点表示正在搜索的数据集中的一个特征向量。有向边的计算是利用这样一个事实:对于这些数据集,数据集元素形成的局部流形结构的固有维数明显低于嵌入空间。我们还提供了一种高效的搜索算法,该算法使用该图以高概率快速找到查询的最近邻居。我们展示了如何调整该方法,以在查询与其最近邻居的阈值距离内提供100%召回的强有力保证。我们证明,我们的方法明显比现有的最先进的方法更有效。特别是,我们的GPU实现可以在Titan x上以每秒超过120万次查询的速度在100万个SIFT描述符的数据集上提供90%的查询召回率。最后,我们还演示了我们的方法如何扩展到5M和20M条目的数据集。
{"title":"FANNG: Fast Approximate Nearest Neighbour Graphs","authors":"Ben Harwood, T. Drummond","doi":"10.1109/CVPR.2016.616","DOIUrl":"https://doi.org/10.1109/CVPR.2016.616","url":null,"abstract":"We present a new method for approximate nearest neighbour search on large datasets of high dimensional feature vectors, such as SIFT or GIST descriptors. Our approach constructs a directed graph that can be efficiently explored for nearest neighbour queries. Each vertex in this graph represents a feature vector from the dataset being searched. The directed edges are computed by exploiting the fact that, for these datasets, the intrinsic dimensionality of the local manifold-like structure formed by the elements of the dataset is significantly lower than the embedding space. We also provide an efficient search algorithm that uses this graph to rapidly find the nearest neighbour to a query with high probability. We show how the method can be adapted to give a strong guarantee of 100% recall where the query is within a threshold distance of its nearest neighbour. We demonstrate that our method is significantly more efficient than existing state of the art methods. In particular, our GPU implementation can deliver 90% recall for queries on a data set of 1 million SIFT descriptors at a rate of over 1.2 million queries per second on a Titan X. Finally we also demonstrate how our method scales to datasets of 5M and 20M entries.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"3 1","pages":"5713-5722"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75359725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Metric Learning as Convex Combinations of Local Models with Generalization Guarantees 度量学习作为具有泛化保证的局部模型的凸组合
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.164
Valentina Zantedeschi, R. Emonet, M. Sebban
Over the past ten years, metric learning allowed the improvement of numerous machine learning approaches that manipulate distances or similarities. In this field, local metric learning has been shown to be very efficient, especially to take into account non linearities in the data and better capture the peculiarities of the application of interest. However, it is well known that local metric learning (i) can entail overfitting and (ii) face difficulties to compare two instances that are assigned to two different local models. In this paper, we address these two issues by introducing a novel metric learning algorithm that linearly combines local models (C2LM). Starting from a partition of the space in regions and a model (a score function) for each region, C2LM defines a metric between points as a weighted combination of the models. A weight vector is learned for each pair of regions, and a spatial regularization ensures that the weight vectors evolve smoothly and that nearby models are favored in the combination. The proposed approach has the particularity of working in a regression setting, of working implicitly at different scales, and of being generic enough so that it is applicable to similarities and distances. We prove theoretical guarantees of the approach using the framework of algorithmic robustness. We carry out experiments with datasets using both distances (perceptual color distances, using Mahalanobis-like distances) and similarities (semantic word similarities, using bilinear forms), showing that C2LM consistently improves regression accuracy even in the case where the amount of training data is small.
在过去的十年里,度量学习使得许多机器学习方法得以改进,这些方法可以操纵距离或相似性。在这一领域,局部度量学习已被证明是非常有效的,特别是在考虑数据中的非线性和更好地捕捉应用兴趣的特殊性方面。然而,众所周知,局部度量学习(i)可能会导致过拟合,(ii)在比较分配给两个不同局部模型的两个实例时面临困难。在本文中,我们通过引入一种新的线性结合局部模型的度量学习算法(C2LM)来解决这两个问题。C2LM从区域的空间划分和每个区域的模型(分数函数)开始,将点之间的度量定义为模型的加权组合。为每对区域学习一个权重向量,空间正则化确保权重向量平滑演化,并且在组合中附近的模型更受青睐。所提出的方法具有在回归设置中工作的特殊性,可以在不同的尺度上隐式工作,并且具有足够的通用性,因此可以适用于相似性和距离。我们利用算法鲁棒性框架证明了该方法的理论保证。我们对使用距离(感知颜色距离,使用类似马哈拉诺比斯的距离)和相似度(语义词相似度,使用双线性形式)的数据集进行了实验,结果表明,即使在训练数据量很小的情况下,C2LM也能持续提高回归精度。
{"title":"Metric Learning as Convex Combinations of Local Models with Generalization Guarantees","authors":"Valentina Zantedeschi, R. Emonet, M. Sebban","doi":"10.1109/CVPR.2016.164","DOIUrl":"https://doi.org/10.1109/CVPR.2016.164","url":null,"abstract":"Over the past ten years, metric learning allowed the improvement of numerous machine learning approaches that manipulate distances or similarities. In this field, local metric learning has been shown to be very efficient, especially to take into account non linearities in the data and better capture the peculiarities of the application of interest. However, it is well known that local metric learning (i) can entail overfitting and (ii) face difficulties to compare two instances that are assigned to two different local models. In this paper, we address these two issues by introducing a novel metric learning algorithm that linearly combines local models (C2LM). Starting from a partition of the space in regions and a model (a score function) for each region, C2LM defines a metric between points as a weighted combination of the models. A weight vector is learned for each pair of regions, and a spatial regularization ensures that the weight vectors evolve smoothly and that nearby models are favored in the combination. The proposed approach has the particularity of working in a regression setting, of working implicitly at different scales, and of being generic enough so that it is applicable to similarities and distances. We prove theoretical guarantees of the approach using the framework of algorithmic robustness. We carry out experiments with datasets using both distances (perceptual color distances, using Mahalanobis-like distances) and similarities (semantic word similarities, using bilinear forms), showing that C2LM consistently improves regression accuracy even in the case where the amount of training data is small.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"162 1","pages":"1478-1486"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75937520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
HyperDepth: Learning Depth from Structured Light without Matching HyperDepth:从没有匹配的结构光中学习深度
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.587
S. Fanello, Christoph Rhemann, V. Tankovich, Adarsh Kowdle, Sergio Orts, David Kim, S. Izadi
Structured light sensors are popular due to their robustness to untextured scenes and multipath. These systems triangulate depth by solving a correspondence problem between each camera and projector pixel. This is often framed as a local stereo matching task, correlating patches of pixels in the observed and reference image. However, this is computationally intensive, leading to reduced depth accuracy and framerate. We contribute an algorithm for solving this correspondence problem efficiently, without compromising depth accuracy. For the first time, this problem is cast as a classification-regression task, which we solve extremely efficiently using an ensemble of cascaded random forests. Our algorithm scales in number of disparities, and each pixel can be processed independently, and in parallel. No matching or even access to the corresponding reference pattern is required at runtime, and regressed labels are directly mapped to depth. Our GPU-based algorithm runs at a 1KHz for 1.3MP input/output images, with disparity error of 0.1 subpixels. We show a prototype high framerate depth camera running at 375Hz, useful for solving tracking-related problems. We demonstrate our algorithmic performance, creating high resolution real-time depth maps that surpass the quality of current state of the art depth technologies, highlighting quantization-free results with reduced holes, edge fattening and other stereo-based depth artifacts.
结构光传感器因其对非纹理场景和多路径的鲁棒性而广受欢迎。这些系统通过解决每个摄像机和投影仪像素之间的对应问题来三角测量深度。这通常是一个局部立体匹配任务,将观察图像和参考图像中的像素块相关联。然而,这是计算密集型的,导致深度精度和帧率降低。我们提出了一种算法来有效地解决这个对应问题,而不影响深度精度。这是第一次,这个问题被转换为分类回归任务,我们使用级联随机森林的集合非常有效地解决了这个问题。我们的算法按差异的数量进行缩放,每个像素都可以独立处理,也可以并行处理。在运行时不需要匹配甚至访问相应的引用模式,并且回归的标签直接映射到depth。我们基于gpu的算法在1.3MP输入/输出图像上以1KHz运行,视差误差为0.1子像素。我们展示了一个运行在375Hz的高帧率深度相机的原型,用于解决与跟踪相关的问题。我们展示了我们的算法性能,创建了超过当前最先进深度技术质量的高分辨率实时深度图,突出了无量化结果,减少了孔,边缘增厚和其他基于立体的深度伪影。
{"title":"HyperDepth: Learning Depth from Structured Light without Matching","authors":"S. Fanello, Christoph Rhemann, V. Tankovich, Adarsh Kowdle, Sergio Orts, David Kim, S. Izadi","doi":"10.1109/CVPR.2016.587","DOIUrl":"https://doi.org/10.1109/CVPR.2016.587","url":null,"abstract":"Structured light sensors are popular due to their robustness to untextured scenes and multipath. These systems triangulate depth by solving a correspondence problem between each camera and projector pixel. This is often framed as a local stereo matching task, correlating patches of pixels in the observed and reference image. However, this is computationally intensive, leading to reduced depth accuracy and framerate. We contribute an algorithm for solving this correspondence problem efficiently, without compromising depth accuracy. For the first time, this problem is cast as a classification-regression task, which we solve extremely efficiently using an ensemble of cascaded random forests. Our algorithm scales in number of disparities, and each pixel can be processed independently, and in parallel. No matching or even access to the corresponding reference pattern is required at runtime, and regressed labels are directly mapped to depth. Our GPU-based algorithm runs at a 1KHz for 1.3MP input/output images, with disparity error of 0.1 subpixels. We show a prototype high framerate depth camera running at 375Hz, useful for solving tracking-related problems. We demonstrate our algorithmic performance, creating high resolution real-time depth maps that surpass the quality of current state of the art depth technologies, highlighting quantization-free results with reduced holes, edge fattening and other stereo-based depth artifacts.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"33 1","pages":"5441-5450"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73853615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
期刊
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1