首页 > 最新文献

2014 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Co-localization in Real-World Images 真实世界图像的共定位
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.190
K. Tang, Armand Joulin, Li-Jia Li, Li Fei-Fei
In this paper, we tackle the problem of co-localization in real-world images. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images. Although similar problems such as co-segmentation and weakly supervised localization have been previously studied, we focus on being able to perform co-localization in real-world settings, which are typically characterized by large amounts of intra-class variation, inter-class diversity, and annotation noise. To address these issues, we present a joint image-box formulation for solving the co-localization problem, and show how it can be relaxed to a convex quadratic program which can be efficiently solved. We perform an extensive evaluation of our method compared to previous state-of-the-art approaches on the challenging PASCAL VOC 2007 and Object Discovery datasets. In addition, we also present a large-scale study of co-localization on ImageNet, involving ground-truth annotations for 3, 624 classes and approximately 1 million images.
在本文中,我们解决了现实世界图像的共定位问题。共定位是指在一组不同的图像中同时定位(使用边界框)同一类对象的问题。虽然以前已经研究过类似的问题,如共分割和弱监督定位,但我们关注的是能够在现实环境中进行共定位,而现实环境通常具有大量的类内变化、类间多样性和注释噪声。为了解决这些问题,我们提出了一个联合图像盒公式来解决共定位问题,并展示了如何将其松弛为一个可以有效求解的凸二次规划。在具有挑战性的PASCAL VOC 2007和对象发现数据集上,我们对我们的方法进行了广泛的评估,与以前最先进的方法进行了比较。此外,我们还在ImageNet上进行了一项大规模的共定位研究,涉及3624个类和大约100万张图像的真值注释。
{"title":"Co-localization in Real-World Images","authors":"K. Tang, Armand Joulin, Li-Jia Li, Li Fei-Fei","doi":"10.1109/CVPR.2014.190","DOIUrl":"https://doi.org/10.1109/CVPR.2014.190","url":null,"abstract":"In this paper, we tackle the problem of co-localization in real-world images. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images. Although similar problems such as co-segmentation and weakly supervised localization have been previously studied, we focus on being able to perform co-localization in real-world settings, which are typically characterized by large amounts of intra-class variation, inter-class diversity, and annotation noise. To address these issues, we present a joint image-box formulation for solving the co-localization problem, and show how it can be relaxed to a convex quadratic program which can be efficiently solved. We perform an extensive evaluation of our method compared to previous state-of-the-art approaches on the challenging PASCAL VOC 2007 and Object Discovery datasets. In addition, we also present a large-scale study of co-localization on ImageNet, involving ground-truth annotations for 3, 624 classes and approximately 1 million images.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124081526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 184
Scalable Multitask Representation Learning for Scene Classification 场景分类的可扩展多任务表示学习
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.186
Maksim Lapin, B. Schiele, Matthias Hein
The underlying idea of multitask learning is that learning tasks jointly is better than learning each task individually. In particular, if only a few training examples are available for each task, sharing a jointly trained representation improves classification performance. In this paper, we propose a novel multitask learning method that learns a low-dimensional representation jointly with the corresponding classifiers, which are then able to profit from the latent inter-class correlations. Our method scales with respect to the original feature dimension and can be used with high-dimensional image descriptors such as the Fisher Vector. Furthermore, it consistently outperforms the current state of the art on the SUN397 scene classification benchmark with varying amounts of training data.
多任务学习的基本思想是,联合学习任务比单独学习每个任务要好。特别是,如果每个任务只有几个训练样例可用,那么共享一个联合训练的表示可以提高分类性能。在本文中,我们提出了一种新的多任务学习方法,该方法与相应的分类器一起学习低维表示,然后能够从潜在的类间相关性中获益。我们的方法相对于原始特征维度进行缩放,可以与高维图像描述符(如Fisher Vector)一起使用。此外,它在使用不同数量的训练数据的SUN397场景分类基准上始终优于当前最先进的技术。
{"title":"Scalable Multitask Representation Learning for Scene Classification","authors":"Maksim Lapin, B. Schiele, Matthias Hein","doi":"10.1109/CVPR.2014.186","DOIUrl":"https://doi.org/10.1109/CVPR.2014.186","url":null,"abstract":"The underlying idea of multitask learning is that learning tasks jointly is better than learning each task individually. In particular, if only a few training examples are available for each task, sharing a jointly trained representation improves classification performance. In this paper, we propose a novel multitask learning method that learns a low-dimensional representation jointly with the corresponding classifiers, which are then able to profit from the latent inter-class correlations. Our method scales with respect to the original feature dimension and can be used with high-dimensional image descriptors such as the Fisher Vector. Furthermore, it consistently outperforms the current state of the art on the SUN397 scene classification benchmark with varying amounts of training data.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124152318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
StoryGraphs: Visualizing Character Interactions as a Timeline 故事图:将角色互动可视化为时间轴
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.111
Makarand Tapaswi, M. Bäuml, R. Stiefelhagen
We present a novel way to automatically summarize and represent the storyline of a TV episode by visualizing character interactions as a chart. We also propose a scene detection method that lends itself well to generate over-segmented scenes which is used to partition the video. The positioning of character lines in the chart is formulated as an optimization problem which trades between the aesthetics and functionality of the chart. Using automatic person identification, we present StoryGraphs for 3 diverse TV series encompassing a total of 22 episodes. We define quantitative criteria to evaluate StoryGraphs and also compare them against episode summaries to evaluate their ability to provide an overview of the episode.
我们提出了一种新颖的方法,通过将角色互动可视化为图表来自动总结和表示电视剧集的故事情节。我们还提出了一种场景检测方法,该方法可以很好地生成用于分割视频的过度分割场景。字符线在图表中的定位是一个优化问题,需要兼顾图表的美观性和功能性。使用自动人物识别,我们呈现了3个不同的电视连续剧的故事图,包括总共22集。我们定义了定量标准来评估StoryGraphs,并将它们与剧集摘要进行比较,以评估它们提供剧集概述的能力。
{"title":"StoryGraphs: Visualizing Character Interactions as a Timeline","authors":"Makarand Tapaswi, M. Bäuml, R. Stiefelhagen","doi":"10.1109/CVPR.2014.111","DOIUrl":"https://doi.org/10.1109/CVPR.2014.111","url":null,"abstract":"We present a novel way to automatically summarize and represent the storyline of a TV episode by visualizing character interactions as a chart. We also propose a scene detection method that lends itself well to generate over-segmented scenes which is used to partition the video. The positioning of character lines in the chart is formulated as an optimization problem which trades between the aesthetics and functionality of the chart. Using automatic person identification, we present StoryGraphs for 3 diverse TV series encompassing a total of 22 episodes. We define quantitative criteria to evaluate StoryGraphs and also compare them against episode summaries to evaluate their ability to provide an overview of the episode.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"289 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124165914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Predicting Matchability 预测Matchability
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.9
Wilfried Hartmann, M. Havlena, K. Schindler
The initial steps of many computer vision algorithms are interest point extraction and matching. In larger image sets the pairwise matching of interest point descriptors between images is an important bottleneck. For each descriptor in one image the (approximate) nearest neighbor in the other one has to be found and checked against the second-nearest neighbor to ensure the correspondence is unambiguous. Here, we asked the question how to best decimate the list of interest points without losing matches, i.e. we aim to speed up matching by filtering out, in advance, those points which would not survive the matching stage. It turns out that the best filtering criterion is not the response of the interest point detector, which in fact is not surprising: the goal of detection are repeatable and well-localized points, whereas the objective of the selection are points whose descriptors can be matched successfully. We show that one can in fact learn to predict which descriptors are matchable, and thus reduce the number of interest points significantly without losing too many matches. We show that this strategy, as simple as it is, greatly improves the matching success with the same number of points per image. Moreover, we embed the prediction in a state-of-the-art Structure-from-Motion pipeline and demonstrate that it also outperforms other selection methods at system level.
许多计算机视觉算法的初始步骤是兴趣点提取和匹配。在较大的图像集中,图像间兴趣点描述符的成对匹配是一个重要的瓶颈。对于一个图像中的每个描述符,必须找到另一个图像中的(近似)最近邻居,并与第二最近邻居进行检查,以确保对应关系是明确的。在这里,我们提出的问题是如何在不丢失匹配的情况下最好地抽取兴趣点列表,即我们的目标是通过提前过滤掉那些无法在匹配阶段存活的点来加速匹配。结果表明,最好的滤波标准不是兴趣点检测器的响应,这并不奇怪:检测的目标是可重复的和良好定位的点,而选择的目标是描述符可以成功匹配的点。我们表明,实际上可以学习预测哪些描述符是匹配的,从而在不丢失太多匹配的情况下显著减少兴趣点的数量。我们表明,这种策略虽然简单,但在每张图像的点数相同的情况下,极大地提高了匹配成功率。此外,我们将预测嵌入到最先进的运动结构管道中,并证明它在系统级别上也优于其他选择方法。
{"title":"Predicting Matchability","authors":"Wilfried Hartmann, M. Havlena, K. Schindler","doi":"10.1109/CVPR.2014.9","DOIUrl":"https://doi.org/10.1109/CVPR.2014.9","url":null,"abstract":"The initial steps of many computer vision algorithms are interest point extraction and matching. In larger image sets the pairwise matching of interest point descriptors between images is an important bottleneck. For each descriptor in one image the (approximate) nearest neighbor in the other one has to be found and checked against the second-nearest neighbor to ensure the correspondence is unambiguous. Here, we asked the question how to best decimate the list of interest points without losing matches, i.e. we aim to speed up matching by filtering out, in advance, those points which would not survive the matching stage. It turns out that the best filtering criterion is not the response of the interest point detector, which in fact is not surprising: the goal of detection are repeatable and well-localized points, whereas the objective of the selection are points whose descriptors can be matched successfully. We show that one can in fact learn to predict which descriptors are matchable, and thus reduce the number of interest points significantly without losing too many matches. We show that this strategy, as simple as it is, greatly improves the matching success with the same number of points per image. Moreover, we embed the prediction in a state-of-the-art Structure-from-Motion pipeline and demonstrate that it also outperforms other selection methods at system level.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127786627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 102
Detecting Objects Using Deformation Dictionaries 使用变形字典检测对象
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.256
Bharath Hariharan, C. L. Zitnick, Piotr Dollár
Several popular and effective object detectors separately model intra-class variations arising from deformations and appearance changes. This reduces model complexity while enabling the detection of objects across changes in view- point, object pose, etc. The Deformable Part Model (DPM) is perhaps the most successful such model to date. A common assumption is that the exponential number of templates enabled by a DPM is critical to its success. In this paper, we show the counter-intuitive result that it is possible to achieve similar accuracy using a small dictionary of deformations. Each component in our model is represented by a single HOG template and a dictionary of flow fields that determine the deformations the template may undergo. While the number of candidate deformations is dramatically fewer than that for a DPM, the deformed templates tend to be plausible and interpretable. In addition, we discover that the set of deformation bases is actually transferable across object categories and that learning shared bases across similar categories can boost accuracy.
一些流行的和有效的对象检测器分别模拟由变形和外观变化引起的类内变化。这降低了模型的复杂性,同时允许在视点、物体姿态等变化中检测物体。可变形部件模型(DPM)可能是迄今为止最成功的此类模型。一个常见的假设是,DPM支持的模板的指数数量对其成功至关重要。在本文中,我们展示了反直觉的结果,即使用小的变形字典可以达到类似的精度。我们模型中的每个组件都由一个单独的HOG模板和一个流场字典表示,流场字典决定了模板可能经历的变形。虽然候选变形的数量大大少于DPM,但变形的模板往往是可信的和可解释的。此外,我们发现变形基的集合实际上是可以跨对象类别转移的,并且学习跨相似类别的共享基可以提高准确性。
{"title":"Detecting Objects Using Deformation Dictionaries","authors":"Bharath Hariharan, C. L. Zitnick, Piotr Dollár","doi":"10.1109/CVPR.2014.256","DOIUrl":"https://doi.org/10.1109/CVPR.2014.256","url":null,"abstract":"Several popular and effective object detectors separately model intra-class variations arising from deformations and appearance changes. This reduces model complexity while enabling the detection of objects across changes in view- point, object pose, etc. The Deformable Part Model (DPM) is perhaps the most successful such model to date. A common assumption is that the exponential number of templates enabled by a DPM is critical to its success. In this paper, we show the counter-intuitive result that it is possible to achieve similar accuracy using a small dictionary of deformations. Each component in our model is represented by a single HOG template and a dictionary of flow fields that determine the deformations the template may undergo. While the number of candidate deformations is dramatically fewer than that for a DPM, the deformed templates tend to be plausible and interpretable. In addition, we discover that the set of deformation bases is actually transferable across object categories and that learning shared bases across similar categories can boost accuracy.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126481568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Beyond Human Opinion Scores: Blind Image Quality Assessment Based on Synthetic Scores 超越人类意见分数:基于合成分数的盲图像质量评估
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.540
Peng Ye, J. Kumar, D. Doermann
State-of-the-art general purpose Blind Image Quality Assessment (BIQA) models rely on examples of distorted images and corresponding human opinion scores to learn a regression function that maps image features to a quality score. These types of models are considered "opinion-aware" (OA) BIQA models. A large set of human scored training examples is usually required to train a reliable OA-BIQA model. However, obtaining human opinion scores through subjective testing is often expensive and time-consuming. It is therefore desirable to develop "opinion-free" (OF) BIQA models that do not require human opinion scores for training. This paper proposes BLISS (Blind Learning of Image Quality using Synthetic Scores). BLISS is a simple, yet effective method for extending OA-BIQA models to OF-BIQA models. Instead of training on human opinion scores, we propose to train BIQA models on synthetic scores derived from Full-Reference (FR) IQA measures. State-of-the-art FR measures yield high correlation with human opinion scores and can serve as approximations to human opinion scores. Unsupervised rank aggregation is applied to combine different FR measures to generate a synthetic score, which serves as a better "gold standard". Extensive experiments on standard IQA datasets show that BLISS significantly outperforms previous OF-BIQA methods and is comparable to state-of-the-art OA-BIQA methods.
最先进的通用盲图像质量评估(BIQA)模型依赖于扭曲图像的示例和相应的人类意见分数来学习将图像特征映射到质量分数的回归函数。这些类型的模型被认为是“意见感知”(OA) BIQA模型。为了训练一个可靠的OA-BIQA模型,通常需要大量的人类得分训练样本。然而,通过主观测试获得人类的意见得分往往是昂贵和耗时的。因此,开发“无意见”(OF) BIQA模型是可取的,这种模型不需要人类的意见分数来进行训练。本文提出了BLISS (Blind Learning of Image Quality using Synthetic Scores)。BLISS是将OA-BIQA模型扩展到OF-BIQA模型的一种简单而有效的方法。我们建议使用来自全参考(FR) IQA度量的合成分数来训练BIQA模型,而不是对人类意见分数进行训练。最先进的FR测量与人类意见得分具有高度相关性,可以作为人类意见得分的近似值。采用无监督秩聚合法,将不同的FR度量结合起来,生成一个综合分数,作为更好的“金标准”。在标准IQA数据集上进行的大量实验表明,BLISS显著优于以前的OF-BIQA方法,并可与最先进的OA-BIQA方法相媲美。
{"title":"Beyond Human Opinion Scores: Blind Image Quality Assessment Based on Synthetic Scores","authors":"Peng Ye, J. Kumar, D. Doermann","doi":"10.1109/CVPR.2014.540","DOIUrl":"https://doi.org/10.1109/CVPR.2014.540","url":null,"abstract":"State-of-the-art general purpose Blind Image Quality Assessment (BIQA) models rely on examples of distorted images and corresponding human opinion scores to learn a regression function that maps image features to a quality score. These types of models are considered \"opinion-aware\" (OA) BIQA models. A large set of human scored training examples is usually required to train a reliable OA-BIQA model. However, obtaining human opinion scores through subjective testing is often expensive and time-consuming. It is therefore desirable to develop \"opinion-free\" (OF) BIQA models that do not require human opinion scores for training. This paper proposes BLISS (Blind Learning of Image Quality using Synthetic Scores). BLISS is a simple, yet effective method for extending OA-BIQA models to OF-BIQA models. Instead of training on human opinion scores, we propose to train BIQA models on synthetic scores derived from Full-Reference (FR) IQA measures. State-of-the-art FR measures yield high correlation with human opinion scores and can serve as approximations to human opinion scores. Unsupervised rank aggregation is applied to combine different FR measures to generate a synthetic score, which serves as a better \"gold standard\". Extensive experiments on standard IQA datasets show that BLISS significantly outperforms previous OF-BIQA methods and is comparable to state-of-the-art OA-BIQA methods.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125482057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Object Partitioning Using Local Convexity 使用局部凸性的对象分区
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.46
S. Stein, Markus Schoeler, Jeremie Papon, F. Wörgötter
The problem of how to arrive at an appropriate 3D-segmentation of a scene remains difficult. While current state-of-the-art methods continue to gradually improve in benchmark performance, they also grow more and more complex, for example by incorporating chains of classifiers, which require training on large manually annotated data-sets. As an alternative to this, we present a new, efficient learning- and model-free approach for the segmentation of 3D point clouds into object parts. The algorithm begins by decomposing the scene into an adjacency-graph of surface patches based on a voxel grid. Edges in the graph are then classified as either convex or concave using a novel combination of simple criteria which operate on the local geometry of these patches. This way the graph is divided into locally convex connected subgraphs, which -- with high accuracy -- represent object parts. Additionally, we propose a novel depth dependent voxel grid to deal with the decreasing point-density at far distances in the point clouds. This improves segmentation, allowing the use of fixed parameters for vastly different scenes. The algorithm is straightforward to implement and requires no training data, while nevertheless producing results that are comparable to state-of-the-art methods which incorporate high-level concepts involving classification, learning and model fitting.
如何对场景进行适当的3d分割仍然是一个难题。虽然当前最先进的方法在基准性能方面继续逐步提高,但它们也变得越来越复杂,例如通过合并分类器链,这需要在大型手动注释数据集上进行训练。作为替代方案,我们提出了一种新的,高效的学习和无模型的方法,用于将3D点云分割为物体部分。该算法首先将场景分解为基于体素网格的表面补丁的邻接图。然后使用一组新的简单准则将图中的边分类为凸或凹,这些准则对这些斑块的局部几何结构进行操作。通过这种方式,图被划分为局部凸连接子图,这些子图以高精度表示对象部分。此外,我们提出了一种新的深度依赖体素网格来处理点云中远距离点密度下降的问题。这改善了分割,允许对不同场景使用固定参数。该算法易于实现,不需要训练数据,但产生的结果可与最先进的方法相媲美,这些方法包含了涉及分类、学习和模型拟合的高级概念。
{"title":"Object Partitioning Using Local Convexity","authors":"S. Stein, Markus Schoeler, Jeremie Papon, F. Wörgötter","doi":"10.1109/CVPR.2014.46","DOIUrl":"https://doi.org/10.1109/CVPR.2014.46","url":null,"abstract":"The problem of how to arrive at an appropriate 3D-segmentation of a scene remains difficult. While current state-of-the-art methods continue to gradually improve in benchmark performance, they also grow more and more complex, for example by incorporating chains of classifiers, which require training on large manually annotated data-sets. As an alternative to this, we present a new, efficient learning- and model-free approach for the segmentation of 3D point clouds into object parts. The algorithm begins by decomposing the scene into an adjacency-graph of surface patches based on a voxel grid. Edges in the graph are then classified as either convex or concave using a novel combination of simple criteria which operate on the local geometry of these patches. This way the graph is divided into locally convex connected subgraphs, which -- with high accuracy -- represent object parts. Additionally, we propose a novel depth dependent voxel grid to deal with the decreasing point-density at far distances in the point clouds. This improves segmentation, allowing the use of fixed parameters for vastly different scenes. The algorithm is straightforward to implement and requires no training data, while nevertheless producing results that are comparable to state-of-the-art methods which incorporate high-level concepts involving classification, learning and model fitting.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127961245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Face Alignment at 3000 FPS via Regressing Local Binary Features 通过回归局部二进制特征在3000帧/秒的人脸对齐
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.218
Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun
This paper presents a highly efficient, very accurate regression approach for face alignment. Our approach has two novel components: a set of local binary features, and a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. Our approach achieves the state-of-the-art results when tested on the current most challenging benchmarks. Furthermore, because extracting and regressing local binary features is computationally very cheap, our system is much faster than previous methods. It achieves over 3, 000 fps on a desktop or 300 fps on a mobile phone for locating a few dozens of landmarks.
本文提出了一种高效、高精度的人脸对齐回归方法。我们的方法有两个新颖的组成部分:一组局部二进制特征,以及用于学习这些特征的局部性原则。局部性原则指导我们独立地学习一组高度判别的局部二值特征。得到的局部二值特征用于共同学习最终输出的线性回归。在当前最具挑战性的基准测试中,我们的方法达到了最先进的结果。此外,由于提取和回归局部二值特征的计算成本非常低,因此我们的系统比以前的方法快得多。它在桌面电脑上达到3000帧/秒以上,在手机上达到300帧/秒以上,可以定位几十个地标。
{"title":"Face Alignment at 3000 FPS via Regressing Local Binary Features","authors":"Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun","doi":"10.1109/CVPR.2014.218","DOIUrl":"https://doi.org/10.1109/CVPR.2014.218","url":null,"abstract":"This paper presents a highly efficient, very accurate regression approach for face alignment. Our approach has two novel components: a set of local binary features, and a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. Our approach achieves the state-of-the-art results when tested on the current most challenging benchmarks. Furthermore, because extracting and regressing local binary features is computationally very cheap, our system is much faster than previous methods. It achieves over 3, 000 fps on a desktop or 300 fps on a mobile phone for locating a few dozens of landmarks.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115843920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 867
Transformation Pursuit for Image Classification 图像分类的变换追求
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.466
Mattis Paulin, Jérôme Revaud, Zaïd Harchaoui, F. Perronnin, C. Schmid
A simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equally informative and adding uninformative transformations increases training time with no gain in accuracy. We propose a principled algorithm -- Image Transformation Pursuit (ITP) -- for the automatic selection of a compact set of transformations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transformations, that combine basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. Using Fisher Vector representations, we achieve an improvement from 28.2% to 45.2% in top-1 accuracy on CUB, and an improvement from 70.1% to 74.9% in top-5 accuracy on ImageNet. We also show significant improvements for deep convnet features: from 47.3% to 55.4% on CUB and from 77.9% to 81.4% on ImageNet.
在图像分类中学习不变性的一个简单方法是用原始图像的变换版本来扩充训练集。然而,给定大量可能的转换,选择一个紧凑的子集是具有挑战性的。事实上,并非所有的转换都具有相同的信息量,添加无信息量的转换会增加训练时间,但准确度却没有提高。我们提出了一个原则性的算法——图像变换追踪(ITP)——用于自动选择一组紧凑的变换。ITP以贪婪的方式工作,通过在每次迭代中选择产生最高精度增益的那个。ITP还允许有效地探索组合基本转换的复杂转换。我们报告了两个公共基准测试的结果:鸟类图像的CUB数据集和ImageNet 2010挑战。使用Fisher向量表示,我们在CUB上将前1名的准确率从28.2%提高到45.2%,在ImageNet上将前5名的准确率从70.1%提高到74.9%。我们还展示了深度卷积特征的显著改进:在CUB上从47.3%提高到55.4%,在ImageNet上从77.9%提高到81.4%。
{"title":"Transformation Pursuit for Image Classification","authors":"Mattis Paulin, Jérôme Revaud, Zaïd Harchaoui, F. Perronnin, C. Schmid","doi":"10.1109/CVPR.2014.466","DOIUrl":"https://doi.org/10.1109/CVPR.2014.466","url":null,"abstract":"A simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equally informative and adding uninformative transformations increases training time with no gain in accuracy. We propose a principled algorithm -- Image Transformation Pursuit (ITP) -- for the automatic selection of a compact set of transformations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transformations, that combine basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. Using Fisher Vector representations, we achieve an improvement from 28.2% to 45.2% in top-1 accuracy on CUB, and an improvement from 70.1% to 74.9% in top-5 accuracy on ImageNet. We also show significant improvements for deep convnet features: from 47.3% to 55.4% on CUB and from 77.9% to 81.4% on ImageNet.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129993651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
The Role of Context for Object Detection and Semantic Segmentation in the Wild 上下文在野外目标检测和语义分割中的作用
Pub Date : 2014-06-23 DOI: 10.1109/CVPR.2014.119
Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, S. Fidler, R. Urtasun, A. Yuille
In this paper we study the role of context in existing state-of-the-art detection and segmentation approaches. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge with a semantic category. We believe this data will provide plenty of challenges to the community, as it contains 520 additional classes for semantic segmentation and object detection. Our analysis shows that nearest neighbor based approaches perform poorly on semantic segmentation of contextual classes, showing the variability of PASCAL imagery. Furthermore, improvements of existing contextual models for detection is rather modest. In order to push forward the performance in this difficult scenario, we propose a novel deformable part-based model, which exploits both local context around each candidate detection as well as global context at the level of the scene. We show that this contextual reasoning significantly helps in detecting objects at all scales.
在本文中,我们研究了上下文在现有的最先进的检测和分割方法中的作用。为了实现这一目标,我们用语义类别标记了PASCAL VOC 2010检测挑战的每个像素。我们相信这些数据会给社区带来很多挑战,因为它包含520个额外的类,用于语义分割和对象检测。我们的分析表明,基于最近邻的方法在上下文类的语义分割上表现不佳,显示了PASCAL图像的可变性。此外,现有的上下文检测模型的改进是相当有限的。为了提高在这种困难场景下的性能,我们提出了一种新的基于可变形部件的模型,该模型既利用了每个候选检测周围的局部上下文,也利用了场景级别的全局上下文。我们表明,这种上下文推理显著有助于在所有尺度上检测物体。
{"title":"The Role of Context for Object Detection and Semantic Segmentation in the Wild","authors":"Roozbeh Mottaghi, Xianjie Chen, Xiaobai Liu, Nam-Gyu Cho, Seong-Whan Lee, S. Fidler, R. Urtasun, A. Yuille","doi":"10.1109/CVPR.2014.119","DOIUrl":"https://doi.org/10.1109/CVPR.2014.119","url":null,"abstract":"In this paper we study the role of context in existing state-of-the-art detection and segmentation approaches. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge with a semantic category. We believe this data will provide plenty of challenges to the community, as it contains 520 additional classes for semantic segmentation and object detection. Our analysis shows that nearest neighbor based approaches perform poorly on semantic segmentation of contextual classes, showing the variability of PASCAL imagery. Furthermore, improvements of existing contextual models for detection is rather modest. In order to push forward the performance in this difficult scenario, we propose a novel deformable part-based model, which exploits both local context around each candidate detection as well as global context at the level of the scene. We show that this contextual reasoning significantly helps in detecting objects at all scales.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134428775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1203
期刊
2014 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1