首页 > 最新文献

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Pull the Plug? Predicting If Computers or Humans Should Segment Images 拔掉插头?预测是计算机还是人类应该分割图像
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.48
D. Gurari, S. Jain, Margrit Betke, K. Grauman
Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods. The framework is based on a proposed prediction module that estimates the quality of given algorithm-drawn segmentations. We demonstrate the value of the framework for two novel tasks related to "pulling the plug" on computer and human annotators. Specifically, we implement two systems that automatically decide, for a batch of images, when to replace 1) humans with computers to create coarse segmentations required to initialize segmentation tools and 2) computers with humans to create final, fine-grained segmentations. Experiments demonstrate the advantage of relying on a mix of human and computer efforts over relying on either resource alone for segmenting objects in three diverse datasets representing visible, phase contrast microscopy, and fluorescence microscopy images.
前景目标分割是许多图像分析任务的关键步骤。虽然自动化方法可以产生高质量的结果,但它们的失败使需要实际解决方案的用户失望。我们提出了一个资源分配框架,用于预测如何最好地分配人类注释工作的固定预算,以便为给定的一批图像和自动化方法收集更高质量的分割。该框架基于一个预测模块,该模块估计给定算法绘制的分割的质量。我们展示了该框架在两个与计算机和人类注释器“拔掉插头”相关的新任务中的价值。具体来说,我们实现了两个系统,它们自动决定,对于一批图像,何时用计算机代替人类来创建初始化分割工具所需的粗分割,以及2)用计算机来创建最终的细粒度分割。实验证明,在三个不同的数据集中,分别代表可见、相对比显微镜和荧光显微镜图像,依靠人和计算机的混合努力比单独依靠任何一种资源来分割对象的优势。
{"title":"Pull the Plug? Predicting If Computers or Humans Should Segment Images","authors":"D. Gurari, S. Jain, Margrit Betke, K. Grauman","doi":"10.1109/CVPR.2016.48","DOIUrl":"https://doi.org/10.1109/CVPR.2016.48","url":null,"abstract":"Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods. The framework is based on a proposed prediction module that estimates the quality of given algorithm-drawn segmentations. We demonstrate the value of the framework for two novel tasks related to \"pulling the plug\" on computer and human annotators. Specifically, we implement two systems that automatically decide, for a batch of images, when to replace 1) humans with computers to create coarse segmentations required to initialize segmentation tools and 2) computers with humans to create final, fine-grained segmentations. Experiments demonstrate the advantage of relying on a mix of human and computer efforts over relying on either resource alone for segmenting objects in three diverse datasets representing visible, phase contrast microscopy, and fluorescence microscopy images.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"16 1","pages":"382-391"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91022843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Deep Relative Distance Learning: Tell the Difference between Similar Vehicles 深度相对远程学习:区分相似车辆
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.238
Hongye Liu, Yonghong Tian, Yaowei Wang, Lu Pang, Tiejun Huang
The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from a large-scale image or video database. However, compared with person re-identification or face recognition, vehicle search problem has long been neglected by researchers in vision community. This paper focuses on an interesting but challenging problem, vehicle re-identification (a.k.a precise vehicle search). We propose a Deep Relative Distance Learning (DRDL) method which exploits a two-branch deep convolutional network to project raw vehicle images into an Euclidean space where distance can be directly used to measure the similarity of arbitrary two vehicles. To further facilitate the future research on this problem, we also present a carefully-organized largescale image database "VehicleID", which includes multiple images of the same vehicle captured by different realworld cameras in a city. We evaluate our DRDL method on our VehicleID dataset and another recently-released vehicle model classification dataset "CompCars" in three sets of experiments: vehicle re-identification, vehicle model verification and vehicle retrieval. Experimental results show that our method can achieve promising results and outperforms several state-of-the-art approaches.
在公共安全中使用监控摄像头的爆炸式增长凸显了从大规模图像或视频数据库中搜索车辆的重要性。然而,与人再识别或人脸识别相比,车辆搜索问题一直被视觉界的研究者所忽视。本文的重点是一个有趣但具有挑战性的问题,车辆再识别(即精确的车辆搜索)。我们提出了一种深度相对距离学习(DRDL)方法,该方法利用两分支深度卷积网络将原始车辆图像投影到欧几里德空间中,在欧几里德空间中,距离可以直接用于测量任意两辆车辆的相似性。为了进一步促进未来对这一问题的研究,我们还提出了一个精心组织的大型图像数据库“VehicleID”,其中包括城市中不同真实世界摄像机拍摄的同一车辆的多幅图像。在车辆再识别、车辆模型验证和车辆检索三组实验中,我们在我们的车辆id数据集和另一个最近发布的车辆模型分类数据集“CompCars”上评估了我们的DRDL方法。实验结果表明,该方法取得了令人满意的结果,并且优于几种最先进的方法。
{"title":"Deep Relative Distance Learning: Tell the Difference between Similar Vehicles","authors":"Hongye Liu, Yonghong Tian, Yaowei Wang, Lu Pang, Tiejun Huang","doi":"10.1109/CVPR.2016.238","DOIUrl":"https://doi.org/10.1109/CVPR.2016.238","url":null,"abstract":"The growing explosion in the use of surveillance cameras in public security highlights the importance of vehicle search from a large-scale image or video database. However, compared with person re-identification or face recognition, vehicle search problem has long been neglected by researchers in vision community. This paper focuses on an interesting but challenging problem, vehicle re-identification (a.k.a precise vehicle search). We propose a Deep Relative Distance Learning (DRDL) method which exploits a two-branch deep convolutional network to project raw vehicle images into an Euclidean space where distance can be directly used to measure the similarity of arbitrary two vehicles. To further facilitate the future research on this problem, we also present a carefully-organized largescale image database \"VehicleID\", which includes multiple images of the same vehicle captured by different realworld cameras in a city. We evaluate our DRDL method on our VehicleID dataset and another recently-released vehicle model classification dataset \"CompCars\" in three sets of experiments: vehicle re-identification, vehicle model verification and vehicle retrieval. Experimental results show that our method can achieve promising results and outperforms several state-of-the-art approaches.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"10 1","pages":"2167-2175"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88931445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 599
Scale-Aware Alignment of Hierarchical Image Segmentation 层次图像分割的尺度感知对齐
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.46
Yuhua Chen, Dengxin Dai, J. Pont-Tuset, L. Gool
Image segmentation is a key component in many computer vision systems, and it is recovering a prominent spot in the literature as methods improve and overcome their limitations. The outputs of most recent algorithms are in the form of a hierarchical segmentation, which provides segmentation at different scales in a single tree-like structure. Commonly, these hierarchical methods start from some low-level features, and are not aware of the scale information of the different regions in them. As such, one might need to work on many different levels of the hierarchy to find the objects in the scene. This work tries to modify the existing hierarchical algorithm by improving their alignment, that is, by trying to modify the depth of the regions in the tree to better couple depth and scale. To do so, we first train a regressor to predict the scale of regions using mid-level features. We then define the anchor slice as the set of regions that better balance between over-segmentation and under-segmentation. The output of our method is an improved hierarchy, re-aligned by the anchor slice. To demonstrate the power of our method, we perform comprehensive experiments, which show that our method, as a post-processing step, can significantly improve the quality of the hierarchical segmentation representations, and ease the usage of hierarchical image segmentation to high-level vision tasks such as object segmentation. We also prove that the improvement generalizes well across different algorithms and datasets, with a low computational cost.
图像分割是许多计算机视觉系统的关键组成部分,随着方法的改进和克服其局限性,它正在恢复在文献中的突出位置。大多数最新算法的输出都是分层分割的形式,它在单个树状结构中提供不同尺度的分割。通常,这些分层方法从一些底层特征出发,不知道其中不同区域的尺度信息。因此,一个人可能需要在许多不同的层次上工作来找到场景中的对象。这项工作试图通过改进现有的分层算法的对齐来修改它们,即通过尝试修改树中区域的深度来更好地耦合深度和规模。为此,我们首先训练一个回归器来使用中级特征预测区域的规模。然后,我们将锚片定义为一组更好地平衡过分割和欠分割的区域。我们方法的输出是一个改进的层次结构,通过锚片重新对齐。为了证明我们的方法的强大,我们进行了全面的实验,结果表明,我们的方法作为后处理步骤,可以显着提高分层图像分割表示的质量,并简化了分层图像分割在高级视觉任务(如对象分割)中的使用。我们还证明了这种改进可以很好地推广到不同的算法和数据集,并且计算成本很低。
{"title":"Scale-Aware Alignment of Hierarchical Image Segmentation","authors":"Yuhua Chen, Dengxin Dai, J. Pont-Tuset, L. Gool","doi":"10.1109/CVPR.2016.46","DOIUrl":"https://doi.org/10.1109/CVPR.2016.46","url":null,"abstract":"Image segmentation is a key component in many computer vision systems, and it is recovering a prominent spot in the literature as methods improve and overcome their limitations. The outputs of most recent algorithms are in the form of a hierarchical segmentation, which provides segmentation at different scales in a single tree-like structure. Commonly, these hierarchical methods start from some low-level features, and are not aware of the scale information of the different regions in them. As such, one might need to work on many different levels of the hierarchy to find the objects in the scene. This work tries to modify the existing hierarchical algorithm by improving their alignment, that is, by trying to modify the depth of the regions in the tree to better couple depth and scale. To do so, we first train a regressor to predict the scale of regions using mid-level features. We then define the anchor slice as the set of regions that better balance between over-segmentation and under-segmentation. The output of our method is an improved hierarchy, re-aligned by the anchor slice. To demonstrate the power of our method, we perform comprehensive experiments, which show that our method, as a post-processing step, can significantly improve the quality of the hierarchical segmentation representations, and ease the usage of hierarchical image segmentation to high-level vision tasks such as object segmentation. We also prove that the improvement generalizes well across different algorithms and datasets, with a low computational cost.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"45 1","pages":"364-372"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90800903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Learning Activity Progression in LSTMs for Activity Detection and Early Detection lstm在活动检测和早期检测中的学习活动进展
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.214
Shugao Ma, L. Sigal, S. Sclaroff
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.
在这项工作中,我们改进了时间深度模型的训练,以更好地学习活动检测和早期检测任务的活动进展。传统上,在训练递归神经网络,特别是长短期记忆(LSTM)模型时,训练损失只考虑分类误差。然而,我们认为正确活动类别的检测分数,或者正确和不正确类别之间的检测分数差,应该随着模型观察到更多的活动而单调地不减小。我们设计了新的排序损失,直接惩罚违反这种单调性的模型,并将其与分类损失一起用于LSTM模型的训练。对ActivityNet的评估表明,在活动检测和早期检测任务中,建议的排名损失都有显著的好处。
{"title":"Learning Activity Progression in LSTMs for Activity Detection and Early Detection","authors":"Shugao Ma, L. Sigal, S. Sclaroff","doi":"10.1109/CVPR.2016.214","DOIUrl":"https://doi.org/10.1109/CVPR.2016.214","url":null,"abstract":"In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"1942-1950"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89692520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 365
Prior-Less Compressible Structure from Motion 运动中的无先验可压缩结构
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.447
Chen Kong, S. Lucey
Many non-rigid 3D structures are not modelled well through a low-rank subspace assumption. This is problematic when it comes to their reconstruction through Structure from Motion (SfM). We argue in this paper that a more expressive and general assumption can be made around compressible 3D structures. The vision community, however, has hitherto struggled to formulate effective strategies for recovering such structures after projection without the aid of additional priors (e.g. temporal ordering, rigid substructures, etc.). In this paper we present a "prior-less" approach to solve compressible SfM. Specifically, we demonstrate how the problem of SfM - assuming compressible 3D structures - can be theoretically characterized as a block sparse dictionary learning problem. We validate our approach experimentally by demonstrating reconstructions of 3D structures that are intractable using current state-of-theart low-rank SfM approaches.
许多非刚性三维结构不能通过低秩子空间假设很好地建模。当涉及到通过运动结构(SfM)重建时,这是有问题的。在本文中,我们认为可以围绕可压缩3D结构做出更具表现力和一般性的假设。然而,迄今为止,视觉界一直在努力制定有效的策略,在没有额外先验(例如时间顺序,刚性子结构等)的帮助下,在投影后恢复这些结构。本文提出了一种求解可压缩SfM的“无先验”方法。具体来说,我们演示了如何将SfM问题(假设可压缩3D结构)在理论上表征为块稀疏字典学习问题。我们通过实验验证了我们的方法,展示了使用当前最先进的低秩SfM方法难以处理的3D结构的重建。
{"title":"Prior-Less Compressible Structure from Motion","authors":"Chen Kong, S. Lucey","doi":"10.1109/CVPR.2016.447","DOIUrl":"https://doi.org/10.1109/CVPR.2016.447","url":null,"abstract":"Many non-rigid 3D structures are not modelled well through a low-rank subspace assumption. This is problematic when it comes to their reconstruction through Structure from Motion (SfM). We argue in this paper that a more expressive and general assumption can be made around compressible 3D structures. The vision community, however, has hitherto struggled to formulate effective strategies for recovering such structures after projection without the aid of additional priors (e.g. temporal ordering, rigid substructures, etc.). In this paper we present a \"prior-less\" approach to solve compressible SfM. Specifically, we demonstrate how the problem of SfM - assuming compressible 3D structures - can be theoretically characterized as a block sparse dictionary learning problem. We validate our approach experimentally by demonstrating reconstructions of 3D structures that are intractable using current state-of-theart low-rank SfM approaches.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"55 7 1","pages":"4123-4131"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85823632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization 大规模迹范数最小化的近端黎曼追踪
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.633
Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, A. Hengel, Javen Qinfeng Shi
Trace-norm regularization plays an important role in many areas such as computer vision and machine learning. When solving general large-scale trace-norm regularized problems, existing methods may be computationally expensive due to many high-dimensional truncated singular value decompositions (SVDs) or the unawareness of matrix ranks. In this paper, we propose a proximal Riemannian pursuit (PRP) paradigm which addresses a sequence of trace-norm regularized subproblems defined on nonlinear matrix varieties. To address the subproblem, we extend the proximal gradient method on vector space to nonlinear matrix varieties, in which the SVDs of intermediate solutions are maintained by cheap low-rank QR decompositions, therefore making the proposed method more scalable. Empirical studies on several tasks, such as matrix completion and low-rank representation based subspace clustering, demonstrate the competitive performance of the proposed paradigms over existing methods.
跟踪范数正则化在计算机视觉、机器学习等领域发挥着重要作用。在求解一般的大规模迹范数正则化问题时,现有方法由于存在大量高维截断奇异值分解(svd)或对矩阵秩的不感知,计算量大。在本文中,我们提出了一个近端黎曼追求(PRP)范式,该范式解决了一系列定义在非线性矩阵变体上的迹范数正则化子问题。为了解决子问题,我们将向量空间上的近端梯度方法扩展到非线性矩阵变体,其中中间解的svd通过廉价的低秩QR分解来维持,从而使所提出的方法更具可扩展性。对若干任务的实证研究,如矩阵补全和基于低秩表示的子空间聚类,证明了所提出的范式与现有方法的竞争性能。
{"title":"Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization","authors":"Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, A. Hengel, Javen Qinfeng Shi","doi":"10.1109/CVPR.2016.633","DOIUrl":"https://doi.org/10.1109/CVPR.2016.633","url":null,"abstract":"Trace-norm regularization plays an important role in many areas such as computer vision and machine learning. When solving general large-scale trace-norm regularized problems, existing methods may be computationally expensive due to many high-dimensional truncated singular value decompositions (SVDs) or the unawareness of matrix ranks. In this paper, we propose a proximal Riemannian pursuit (PRP) paradigm which addresses a sequence of trace-norm regularized subproblems defined on nonlinear matrix varieties. To address the subproblem, we extend the proximal gradient method on vector space to nonlinear matrix varieties, in which the SVDs of intermediate solutions are maintained by cheap low-rank QR decompositions, therefore making the proposed method more scalable. Empirical studies on several tasks, such as matrix completion and low-rank representation based subspace clustering, demonstrate the competitive performance of the proposed paradigms over existing methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"49 1","pages":"5877-5886"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86334233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Geometry-Informed Material Recognition 几何信息材料识别
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.172
Joseph DeGol, M. G. Fard, Derek Hoiem
Our goal is to recognize material categories using images and geometry information. In many applications, such as construction management, coarse geometry information is available. We investigate how 3D geometry (surface normals, camera intrinsic and extrinsic parameters) can be used with 2D features (texture and color) to improve material classification. We introduce a new dataset, GeoMat, which is the first to provide both image and geometry data in the form of: (i) training and testing patches that were extracted at different scales and perspectives from real world examples of each material category, and (ii) a large scale construction site scene that includes 160 images and over 800,000 hand labeled 3D points. Our results show that using 2D and 3D features both jointly and independently to model materials improves classification accuracy across multiple scales and viewing directions for both material patches and images of a large scale construction site scene.
我们的目标是使用图像和几何信息来识别材料类别。在许多应用中,例如施工管理,可以使用粗糙的几何信息。我们研究了如何将3D几何(表面法线,相机内部和外部参数)与2D特征(纹理和颜色)一起使用,以改进材料分类。我们引入了一个新的数据集,GeoMat,它是第一个以以下形式提供图像和几何数据的数据集:(i)从每个材料类别的真实世界示例中以不同的尺度和视角提取的训练和测试补丁,以及(ii)一个大型建筑工地场景,包括160张图像和超过80万个手工标记的3D点。我们的研究结果表明,结合或单独使用二维和三维特征来建模材料,可以提高大规模建筑工地场景中材料斑块和图像在多个尺度和观看方向上的分类精度。
{"title":"Geometry-Informed Material Recognition","authors":"Joseph DeGol, M. G. Fard, Derek Hoiem","doi":"10.1109/CVPR.2016.172","DOIUrl":"https://doi.org/10.1109/CVPR.2016.172","url":null,"abstract":"Our goal is to recognize material categories using images and geometry information. In many applications, such as construction management, coarse geometry information is available. We investigate how 3D geometry (surface normals, camera intrinsic and extrinsic parameters) can be used with 2D features (texture and color) to improve material classification. We introduce a new dataset, GeoMat, which is the first to provide both image and geometry data in the form of: (i) training and testing patches that were extracted at different scales and perspectives from real world examples of each material category, and (ii) a large scale construction site scene that includes 160 images and over 800,000 hand labeled 3D points. Our results show that using 2D and 3D features both jointly and independently to model materials improves classification accuracy across multiple scales and viewing directions for both material patches and images of a large scale construction site scene.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"68 1","pages":"1554-1562"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77725630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
The Global Patch Collider 全局补丁碰撞器
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.21
Shenlong Wang, S. Fanello, Christoph Rhemann, S. Izadi, Pushmeet Kohli
This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other ie. fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.
本文提出了一种新的高效的、完全可并行化的、任务特定的算法,用于计算图像和视频中的全局逐点对应。我们的算法,Global Patch Collider,是基于使用一组作为条件哈希函数的学习树结构来检测图像点之间的唯一碰撞。与依赖于两两距离计算的传统方法相比,我们的算法隔离了在遍历多个学习树结构时遇到相同叶子的不同像素对。存储在树中间节点的分割函数经过训练,以确保只有视觉上相似的斑块或其几何或光度转换版本落在相同的叶节点上。匹配过程包括通过树结构传递被分析图像中的所有像素位置。然后,我们通过隔离唯一相互碰撞的点来计算匹配。落在多棵树上的同一片空叶子上。我们的算法在像素数量上是线性的,但是由于单个图像点的树遍历是解耦的,因此可以在并行计算架构上进行常数时间。我们通过使用该方法在一些具有挑战性的基准上执行光流匹配和立体匹配来证明该方法的有效性。实验结果表明,我们的方法不仅具有极高的计算效率,而且还能够匹配或优于更复杂的最先进方法。
{"title":"The Global Patch Collider","authors":"Shenlong Wang, S. Fanello, Christoph Rhemann, S. Izadi, Pushmeet Kohli","doi":"10.1109/CVPR.2016.21","DOIUrl":"https://doi.org/10.1109/CVPR.2016.21","url":null,"abstract":"This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other ie. fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"127-135"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82143987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A Holistic Approach to Cross-Channel Image Noise Modeling and Its Application to Image Denoising 跨通道图像噪声建模的整体方法及其在图像去噪中的应用
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.186
Seonghyeon Nam, Youngbae Hwang, Y. Matsushita, Seon Joo Kim
Modelling and analyzing noise in images is a fundamental task in many computer vision systems. Traditionally, noise has been modelled per color channel assuming that the color channels are independent. Although the color channels can be considered as mutually independent in camera RAW images, signals from different color channels get mixed during the imaging process inside the camera due to gamut mapping, tone-mapping, and compression. We show the influence of the in-camera imaging pipeline on noise and propose a new noise model in the 3D RGB space to accounts for the color channel mix-ups. A data-driven approach for determining the parameters of the new noise model is introduced as well as its application to image denoising. The experiments show that our noise model represents the noise in regular JPEG images more accurately compared to the previous models and is advantageous in image denoising.
对图像中的噪声进行建模和分析是许多计算机视觉系统的基本任务。传统上,假设每个颜色通道是独立的,噪声被建模为每个颜色通道。虽然在相机RAW图像中,颜色通道可以看作是相互独立的,但是在相机内部的成像过程中,由于色域映射、色调映射和压缩,不同颜色通道的信号会混合在一起。我们展示了相机内成像管道对噪声的影响,并在3D RGB空间中提出了一个新的噪声模型来解释颜色通道混淆。介绍了一种确定新噪声模型参数的数据驱动方法及其在图像去噪中的应用。实验表明,我们的噪声模型比以往的模型更准确地反映了常规JPEG图像中的噪声,在图像去噪方面具有优势。
{"title":"A Holistic Approach to Cross-Channel Image Noise Modeling and Its Application to Image Denoising","authors":"Seonghyeon Nam, Youngbae Hwang, Y. Matsushita, Seon Joo Kim","doi":"10.1109/CVPR.2016.186","DOIUrl":"https://doi.org/10.1109/CVPR.2016.186","url":null,"abstract":"Modelling and analyzing noise in images is a fundamental task in many computer vision systems. Traditionally, noise has been modelled per color channel assuming that the color channels are independent. Although the color channels can be considered as mutually independent in camera RAW images, signals from different color channels get mixed during the imaging process inside the camera due to gamut mapping, tone-mapping, and compression. We show the influence of the in-camera imaging pipeline on noise and propose a new noise model in the 3D RGB space to accounts for the color channel mix-ups. A data-driven approach for determining the parameters of the new noise model is introduced as well as its application to image denoising. The experiments show that our noise model represents the noise in regular JPEG images more accurately compared to the previous models and is advantageous in image denoising.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"1683-1691"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81715121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 184
Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition 基于多模态特征融合的RGBD室内场景识别
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.324
Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu
RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.
由于深度传感器的快速发展和广泛的应用场景,RGBD场景识别越来越受到人们的关注。虽然已经进行了许多研究,但大多数工作使用手工制作的特征,这很难捕获高级语义结构。最近,从深度卷积神经网络中提取的特征在各种计算机视觉任务中产生了最先进的结果,这激发了研究人员探索将CNN学习特征用于RGBD场景理解的探索。另一方面,大多数现有工作将rgb和深度特征结合在一起,没有充分利用它们之间的一致性和互补性信息。受近年来一些基于多模态特征融合的RGBD目标识别工作的启发,我们首次引入了一种新的用于RGBD场景识别的判别性多模态融合框架,该框架同时考虑了所有样本的模态间和模态内相关性,同时对学习到的特征进行正则化,使其具有判别性和紧凑性。多模态层的结果可以反向传播到较低的CNN层,因此CNN层和多模态层的参数迭代更新直到收敛。在最近提出的大规模SUN RGB-D数据集上的实验表明,我们的方法在没有任何图像分割的情况下达到了最先进的水平。
{"title":"Discriminative Multi-modal Feature Fusion for RGBD Indoor Scene Recognition","authors":"Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu","doi":"10.1109/CVPR.2016.324","DOIUrl":"https://doi.org/10.1109/CVPR.2016.324","url":null,"abstract":"RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios. While many research has been conducted, most work used hand-crafted features which are difficult to capture high-level semantic structures. Recently, the feature extracted from deep convolutional neural network has produced state-of-the-art results for various computer vision tasks, which inspire researchers to explore incorporating CNN learned features for RGBD scene understanding. On the other hand, most existing work combines rgb and depth features without adequately exploiting the consistency and complementary information between them. Inspired by some recent work on RGBD object recognition using multi-modal feature fusion, we introduce a novel discriminative multi-modal fusion framework for rgbd scene recognition for the first time which simultaneously considers the inter-and intra-modality correlation for all samples and meanwhile regularizing the learned features to be discriminative and compact. The results from the multimodal layer can be back-propagated to the lower CNN layers, hence the parameters of the CNN layers and multimodal layers are updated iteratively until convergence. Experiments on the recently proposed large scale SUN RGB-D datasets show that our method achieved the state-of-the-art without any image segmentation.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"2969-2976"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90067423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
期刊
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1