首页 > 最新文献

2015 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Simpler Non-Parametric Methods Provide as Good or Better Results to Multiple-Instance Learning 简单的非参数方法对多实例学习提供了同样好的或更好的结果
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.299
Ragav Venkatesan, P. S. Chandakkar, Baoxin Li
Multiple-instance learning (MIL) is a unique learning problem in which training data labels are available only for collections of objects (called bags) instead of individual objects (called instances). A plethora of approaches have been developed to solve this problem in the past years. Popular methods include the diverse density, MILIS and DD-SVM. While having been widely used, these methods, particularly those in computer vision have attempted fairly sophisticated solutions to solve certain unique and particular configurations of the MIL space. In this paper, we analyze the MIL feature space using modified versions of traditional non-parametric techniques like the Parzen window and k-nearest-neighbour, and develop a learning approach employing distances to k-nearest neighbours of a point in the feature space. We show that these methods work as well, if not better than most recently published methods on benchmark datasets. We compare and contrast our analysis with the well-established diverse-density approach and its variants in recent literature, using benchmark datasets including the Musk, Andrews' and Corel datasets, along with a diabetic retinopathy pathology diagnosis dataset. Experimental results demonstrate that, while enjoying an intuitive interpretation and supporting fast learning, these method have the potential of delivering improved performance even for complex data arising from real-world applications.
多实例学习(MIL)是一种独特的学习问题,其中训练数据标签只能用于对象集合(称为袋),而不能用于单个对象(称为实例)。在过去的几年里,已经开发了大量的方法来解决这个问题。常用的方法有:变密度、MILIS和DD-SVM。这些方法,特别是计算机视觉中的方法,在被广泛使用的同时,已经尝试了相当复杂的解决方案来解决MIL空间的某些独特和特定配置。在本文中,我们使用传统的非参数技术(如Parzen窗口和k近邻)的改进版本来分析MIL特征空间,并开发了一种利用特征空间中点到k近邻的距离的学习方法。我们表明,这些方法即使不比最近发布的基准数据集上的方法更好,也同样有效。我们使用基准数据集(包括Musk、Andrews和Corel数据集)以及糖尿病视网膜病变病理诊断数据集,将我们的分析与近期文献中完善的不同密度方法及其变体进行了比较和对比。实验结果表明,在享受直观解释和支持快速学习的同时,这些方法有可能提供更好的性能,即使是来自现实世界应用的复杂数据。
{"title":"Simpler Non-Parametric Methods Provide as Good or Better Results to Multiple-Instance Learning","authors":"Ragav Venkatesan, P. S. Chandakkar, Baoxin Li","doi":"10.1109/ICCV.2015.299","DOIUrl":"https://doi.org/10.1109/ICCV.2015.299","url":null,"abstract":"Multiple-instance learning (MIL) is a unique learning problem in which training data labels are available only for collections of objects (called bags) instead of individual objects (called instances). A plethora of approaches have been developed to solve this problem in the past years. Popular methods include the diverse density, MILIS and DD-SVM. While having been widely used, these methods, particularly those in computer vision have attempted fairly sophisticated solutions to solve certain unique and particular configurations of the MIL space. In this paper, we analyze the MIL feature space using modified versions of traditional non-parametric techniques like the Parzen window and k-nearest-neighbour, and develop a learning approach employing distances to k-nearest neighbours of a point in the feature space. We show that these methods work as well, if not better than most recently published methods on benchmark datasets. We compare and contrast our analysis with the well-established diverse-density approach and its variants in recent literature, using benchmark datasets including the Musk, Andrews' and Corel datasets, along with a diabetic retinopathy pathology diagnosis dataset. Experimental results demonstrate that, while enjoying an intuitive interpretation and supporting fast learning, these method have the potential of delivering improved performance even for complex data arising from real-world applications.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"8 1","pages":"2605-2613"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82553768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation 一种具有可微可视性的多用途场景模型用于生成姿态估计
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.94
Helge Rhodin, Nadia Robertini, Christian Richardt, H. Seidel, C. Theobalt
Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.
生成重建方法通过优化投影3D形状模型与图像的重叠来计算形状的3D配置(例如姿势和/或几何形状)。正确处理遮挡是一个很大的挑战,因为指示是否从相机看到一个表面点的可见性函数通常不能以封闭形式表示,并且通常在遮挡边界处是离散的和不可微的。我们提出了一种新的场景表示,使表面可见性的解析可微封闭形式的公式。与以前的方法相比,该方法具有光滑、解析可微和高效的位姿相似能量优化,具有严格的遮挡处理,更少的局部极小值,并且实验验证了数值优化的收敛性。其基本思想是一种新的图像形成模型,用光滑高斯密度分布的半透明介质表示不透明物体,从而将可见性转化为光滑现象。我们在几个生成式姿态估计问题中展示了我们的通用场景模型的优势,即无标记的多目标姿态估计,使用少量相机的无标记的人体运动捕捉,以及基于图像的3D几何估计。
{"title":"A Versatile Scene Model with Differentiable Visibility Applied to Generative Pose Estimation","authors":"Helge Rhodin, Nadia Robertini, Christian Richardt, H. Seidel, C. Theobalt","doi":"10.1109/ICCV.2015.94","DOIUrl":"https://doi.org/10.1109/ICCV.2015.94","url":null,"abstract":"Generative reconstruction methods compute the 3D configuration (such as pose and/or geometry) of a shape by optimizing the overlap of the projected 3D shape model with images. Proper handling of occlusions is a big challenge, since the visibility function that indicates if a surface point is seen from a camera can often not be formulated in closed form, and is in general discrete and non-differentiable at occlusion boundaries. We present a new scene representation that enables an analytically differentiable closed-form formulation of surface visibility. In contrast to previous methods, this yields smooth, analytically differentiable, and efficient to optimize pose similarity energies with rigorous occlusion handling, fewer local minima, and experimentally verified improved convergence of numerical optimization. The underlying idea is a new image formation model that represents opaque objects by a translucent medium with a smooth Gaussian density distribution which turns visibility into a smooth phenomenon. We demonstrate the advantages of our versatile scene model in several generative pose estimation problems, namely marker-less multi-object pose estimation, marker-less human motion capture with few cameras, and image-based 3D geometry estimation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"60 1","pages":"765-773"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76247922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Semi-Supervised Normalized Cuts for Image Segmentation 半监督归一化分割图像
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.200
Selene E. Chew, N. Cahill
Since its introduction as a powerful graph-based method for image segmentation, the Normalized Cuts (NCuts) algorithm has been generalized to incorporate expert knowledge about how certain pixels or regions should be grouped, or how the resulting segmentation should be biased to be correlated with priors. Previous approaches incorporate hard must-link constraints on how certain pixels should be grouped as well as hard cannot-link constraints on how other pixels should be separated into different groups. In this paper, we reformulate NCuts to allow both sets of constraints to be handled in a soft manner, enabling the user to tune the degree to which the constraints are satisfied. An approximate spectral solution to the reformulated problem exists without requiring explicit construction of a large, dense matrix, hence, computation time is comparable to that of unconstrained NCuts. Using synthetic data and real imagery, we show that soft handling of constraints yields better results than unconstrained NCuts and enables more robust clustering and segmentation than is possible when the constraints are strictly enforced.
作为一种强大的基于图的图像分割方法,归一化分割(NCuts)算法已经被推广到包含关于某些像素或区域应该如何分组,或者结果分割应该如何与先验相关的专家知识。以前的方法结合了硬必须链接约束,对某些像素应该如何分组,以及硬不可链接约束,对其他像素应该如何分成不同的组。在本文中,我们重新制定了NCuts,以允许以软方式处理这两组约束,使用户能够调整约束的满足程度。一个近似的谱解存在于重新表述的问题,不需要显式构造一个大的,密集的矩阵,因此,计算时间与无约束的NCuts相当。使用合成数据和真实图像,我们表明约束的软处理比无约束的NCuts产生更好的结果,并且比严格执行约束时能够实现更健壮的聚类和分割。
{"title":"Semi-Supervised Normalized Cuts for Image Segmentation","authors":"Selene E. Chew, N. Cahill","doi":"10.1109/ICCV.2015.200","DOIUrl":"https://doi.org/10.1109/ICCV.2015.200","url":null,"abstract":"Since its introduction as a powerful graph-based method for image segmentation, the Normalized Cuts (NCuts) algorithm has been generalized to incorporate expert knowledge about how certain pixels or regions should be grouped, or how the resulting segmentation should be biased to be correlated with priors. Previous approaches incorporate hard must-link constraints on how certain pixels should be grouped as well as hard cannot-link constraints on how other pixels should be separated into different groups. In this paper, we reformulate NCuts to allow both sets of constraints to be handled in a soft manner, enabling the user to tune the degree to which the constraints are satisfied. An approximate spectral solution to the reformulated problem exists without requiring explicit construction of a large, dense matrix, hence, computation time is comparable to that of unconstrained NCuts. Using synthetic data and real imagery, we show that soft handling of constraints yields better results than unconstrained NCuts and enables more robust clustering and segmentation than is possible when the constraints are strictly enforced.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"194 1","pages":"1716-1723"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86828157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Predicting Multiple Structured Visual Interpretations 预测多重结构视觉解释
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.337
Debadeepta Dey, V. Ramakrishna, M. Hebert, J. Bagnell
We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation. Current state-of-the-art approaches learn a single model and modify inference procedures to produce a small number of diverse predictions. We take the alternate route of modifying the learning procedure to directly optimize for good, high recall sequences of structured-output predictors. Our approach introduces no new parameters, naturally learns diverse predictions and is not tied to any specific structured learning or inference procedure. We leverage recent advances in the contextual submodular maximization literature to learn a sequence of predictors and empirically demonstrate the simplicity and performance of our approach on multiple challenging vision tasks including achieving state-of-the-art results on multiple predictions for monocular pose-estimation and image foreground/background segmentation.
我们提出了一种简单的方法来产生少量具有高召回率的结构化视觉输出,用于各种任务,包括单目姿态估计和语义场景分割。目前最先进的方法学习单一模型并修改推理程序以产生少量不同的预测。我们采用修改学习过程的替代路线,直接优化结构化输出预测器的良好,高召回序列。我们的方法没有引入新的参数,自然地学习不同的预测,并且不依赖于任何特定的结构化学习或推理过程。我们利用上下文子模块最大化文献的最新进展来学习一系列预测因子,并通过经验证明我们的方法在多个具有挑战性的视觉任务上的简单性和性能,包括在单眼姿态估计和图像前景/背景分割的多个预测上取得最先进的结果。
{"title":"Predicting Multiple Structured Visual Interpretations","authors":"Debadeepta Dey, V. Ramakrishna, M. Hebert, J. Bagnell","doi":"10.1109/ICCV.2015.337","DOIUrl":"https://doi.org/10.1109/ICCV.2015.337","url":null,"abstract":"We present a simple approach for producing a small number of structured visual outputs which have high recall, for a variety of tasks including monocular pose estimation and semantic scene segmentation. Current state-of-the-art approaches learn a single model and modify inference procedures to produce a small number of diverse predictions. We take the alternate route of modifying the learning procedure to directly optimize for good, high recall sequences of structured-output predictors. Our approach introduces no new parameters, naturally learns diverse predictions and is not tied to any specific structured learning or inference procedure. We leverage recent advances in the contextual submodular maximization literature to learn a sequence of predictors and empirically demonstrate the simplicity and performance of our approach on multiple challenging vision tasks including achieving state-of-the-art results on multiple predictions for monocular pose-estimation and image foreground/background segmentation.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"2947-2955"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87995521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Variational PatchMatch MultiView Reconstruction and Refinement 变分PatchMatch多视图重构与细化
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.107
Philipp Heise, B. Jensen, S. Klose, Alois Knoll
In this work we propose a novel approach to the problem of multi-view stereo reconstruction. Building upon the previously proposed PatchMatch stereo and PM-Huber algorithm we introduce an extension to the multi-view scenario that employs an iterative refinement scheme. Our proposed approach uses an extended and robustified volumetric truncated signed distance function representation, which is advantageous for the fusion of refined depth maps and also for raycasting the current reconstruction estimation together with estimated depth normals into arbitrary camera views. We formulate the combined multi-view stereo reconstruction and refinement as a variational optimization problem. The newly introduced plane based smoothing term in the energy formulation is guided by the current reconstruction confidence and the image contents. Further we propose an extension of the PatchMatch scheme with an additional KLT step to avoid unnecessary sampling iterations. Improper camera poses are corrected by a direct image aligment step that performs robust outlier compensation by means of a recently proposed kernel lifting framework. To speed up the optimization of the variational formulation an adapted scheme is used for faster convergence.
在这项工作中,我们提出了一种新的方法来解决多视图立体重建问题。在先前提出的PatchMatch立体和PM-Huber算法的基础上,我们引入了一种扩展到多视图场景,采用迭代优化方案。我们提出的方法使用扩展和鲁棒的体积截断符号距离函数表示,这有利于融合精细的深度图,也有利于将当前重建估计与估计的深度法线一起光线投射到任意相机视图中。我们将多视点立体图像的重构与细化结合为一个变分优化问题。能量公式中新引入的基于平面的平滑项以当前重建置信度和图像内容为指导。此外,我们提出了一个扩展的PatchMatch方案与额外的KLT步骤,以避免不必要的采样迭代。不适当的相机姿势是由一个直接的图像对准步骤,执行鲁棒异常补偿的手段,最近提出的核提升框架进行纠正。为了加快变分公式的优化速度,采用了一种适应格式来加快收敛速度。
{"title":"Variational PatchMatch MultiView Reconstruction and Refinement","authors":"Philipp Heise, B. Jensen, S. Klose, Alois Knoll","doi":"10.1109/ICCV.2015.107","DOIUrl":"https://doi.org/10.1109/ICCV.2015.107","url":null,"abstract":"In this work we propose a novel approach to the problem of multi-view stereo reconstruction. Building upon the previously proposed PatchMatch stereo and PM-Huber algorithm we introduce an extension to the multi-view scenario that employs an iterative refinement scheme. Our proposed approach uses an extended and robustified volumetric truncated signed distance function representation, which is advantageous for the fusion of refined depth maps and also for raycasting the current reconstruction estimation together with estimated depth normals into arbitrary camera views. We formulate the combined multi-view stereo reconstruction and refinement as a variational optimization problem. The newly introduced plane based smoothing term in the energy formulation is guided by the current reconstruction confidence and the image contents. Further we propose an extension of the PatchMatch scheme with an additional KLT step to avoid unnecessary sampling iterations. Improper camera poses are corrected by a direct image aligment step that performs robust outlier compensation by means of a recently proposed kernel lifting framework. To speed up the optimization of the variational formulation an adapted scheme is used for faster convergence.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"32 1","pages":"882-890"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88869164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Improving Ferns Ensembles by Sparsifying and Quantising Posterior Probabilities 通过稀疏化和量化后验概率改进蕨类植物集合
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.467
Antonio L. Rodríguez, V. Sequeira
Ferns ensembles offer an accurate and efficient multiclass non-linear classification, commonly at the expense of consuming a large amount of memory. We introduce a two-fold contribution that produces large reductions in their memory consumption. First, an efficient L0 regularised cost optimisation finds a sparse representation of the posterior probabilities in the ensemble by discarding elements with zero contribution to valid responses in the training samples. As a by-product this can produce a prediction accuracy gain that, if required, can be traded for further reductions in memory size and prediction time. Secondly, posterior probabilities are quantised and stored in a memory-friendly sparse data structure. We reported a minimum of 75% memory reduction for different types of classification problems using generative and discriminative ferns ensembles, without increasing prediction time or classification error. For image patch recognition our proposal produced a 90% memory reduction, and improved in several percentage points the prediction accuracy.
蕨类集合提供了一个准确和有效的多类非线性分类,通常以消耗大量内存为代价。我们引入了双重贡献,从而大大降低了它们的内存消耗。首先,通过丢弃训练样本中对有效响应贡献为零的元素,高效的L0正则化成本优化找到集合中后验概率的稀疏表示。作为一个副产品,这可以产生预测精度的增益,如果需要,可以换取内存大小和预测时间的进一步减少。其次,将后验概率量化并存储在内存友好的稀疏数据结构中。我们报告了在不增加预测时间或分类误差的情况下,使用生成和判别蕨类植物集成对不同类型的分类问题至少减少75%的内存。对于图像补丁识别,我们的建议产生了90%的内存减少,并提高了几个百分点的预测精度。
{"title":"Improving Ferns Ensembles by Sparsifying and Quantising Posterior Probabilities","authors":"Antonio L. Rodríguez, V. Sequeira","doi":"10.1109/ICCV.2015.467","DOIUrl":"https://doi.org/10.1109/ICCV.2015.467","url":null,"abstract":"Ferns ensembles offer an accurate and efficient multiclass non-linear classification, commonly at the expense of consuming a large amount of memory. We introduce a two-fold contribution that produces large reductions in their memory consumption. First, an efficient L0 regularised cost optimisation finds a sparse representation of the posterior probabilities in the ensemble by discarding elements with zero contribution to valid responses in the training samples. As a by-product this can produce a prediction accuracy gain that, if required, can be traded for further reductions in memory size and prediction time. Secondly, posterior probabilities are quantised and stored in a memory-friendly sparse data structure. We reported a minimum of 75% memory reduction for different types of classification problems using generative and discriminative ferns ensembles, without increasing prediction time or classification error. For image patch recognition our proposal produced a 90% memory reduction, and improved in several percentage points the prediction accuracy.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"188 1","pages":"4103-4111"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83053129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Matrix Decomposition Perspective to Multiple Graph Matching 从矩阵分解的角度看多图匹配
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.31
Junchi Yan, Hongteng Xu, H. Zha, Xiaokang Yang, Huanxi Liu, Stephen M. Chu
Graph matching has a wide spectrum of real-world applications and in general is known NP-hard. In many vision tasks, one realistic problem arises for finding the global node mappings across a batch of corrupted weighted graphs. This paper is an attempt to connect graph matching, especially multi-graph matching to the matrix decomposition model and its relevant on-the-shelf convex optimization algorithms. Our method aims to extract the common inliers and their synchronized permutations from disordered weighted graphs in the presence of deformation and outliers. Under the proposed framework, several variants can be derived in the hope of accommodating to other types of noises. Experimental results on both synthetic data and real images empirically show that the proposed paradigm exhibits several interesting behaviors and in many cases performs competitively with the state-of-the-arts.
图匹配在现实世界中有着广泛的应用,通常被称为NP-hard。在许多视觉任务中,一个现实的问题是在一批损坏的加权图中寻找全局节点映射。本文试图将图匹配,特别是多图匹配与矩阵分解模型及其相关的现成凸优化算法联系起来。我们的方法旨在从存在变形和离群值的无序加权图中提取共同内线及其同步排列。在提出的框架下,可以推导出几种变体,以期适应其他类型的噪声。在合成数据和真实图像上的实验结果经验表明,所提出的范式表现出一些有趣的行为,并且在许多情况下与最先进的范式竞争。
{"title":"A Matrix Decomposition Perspective to Multiple Graph Matching","authors":"Junchi Yan, Hongteng Xu, H. Zha, Xiaokang Yang, Huanxi Liu, Stephen M. Chu","doi":"10.1109/ICCV.2015.31","DOIUrl":"https://doi.org/10.1109/ICCV.2015.31","url":null,"abstract":"Graph matching has a wide spectrum of real-world applications and in general is known NP-hard. In many vision tasks, one realistic problem arises for finding the global node mappings across a batch of corrupted weighted graphs. This paper is an attempt to connect graph matching, especially multi-graph matching to the matrix decomposition model and its relevant on-the-shelf convex optimization algorithms. Our method aims to extract the common inliers and their synchronized permutations from disordered weighted graphs in the presence of deformation and outliers. Under the proposed framework, several variants can be derived in the hope of accommodating to other types of noises. Experimental results on both synthetic data and real images empirically show that the proposed paradigm exhibits several interesting behaviors and in many cases performs competitively with the state-of-the-arts.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"115 1","pages":"199-207"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79606542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Deep Neural Decision Forests 深度神经决策森林
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.172
P. Kontschieder, M. Fiterau, A. Criminisi, S. R. Bulò
We present Deep Neural Decision Forests - a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).
我们提出了深度神经决策森林——一种新的方法,通过端到端方式训练分类树,将分类树与深度卷积网络中已知的表示学习功能结合起来。为了结合这两个世界,我们引入了一个随机和可微的决策树模型,该模型指导通常在(深度)卷积网络的初始层进行的表示学习。我们的模型与传统的深度网络不同,因为决策森林提供最终预测,它与传统的决策森林不同,因为我们提出了分裂和叶节点参数的原则,联合和全局优化。我们在基准机器学习数据集(如MNIST和ImageNet)上展示了实验结果,并与最先进的深度模型相比,发现了同等或更好的结果。最值得注意的是,当我们在单一作物和单一/七个模型的GoogLeNet架构中整合我们的森林时,我们在ImageNet验证数据上获得的top5误差分别为7.84%和6.38%。因此,即使没有任何形式的训练数据集增强,我们也在改进最好的GoogLeNet架构(7个模型,144个作物)所获得的6.67%的误差。
{"title":"Deep Neural Decision Forests","authors":"P. Kontschieder, M. Fiterau, A. Criminisi, S. R. Bulò","doi":"10.1109/ICCV.2015.172","DOIUrl":"https://doi.org/10.1109/ICCV.2015.172","url":null,"abstract":"We present Deep Neural Decision Forests - a novel approach that unifies classification trees with the representation learning functionality known from deep convolutional networks, by training them in an end-to-end manner. To combine these two worlds, we introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network. Our model differs from conventional deep networks because a decision forest provides the final predictions and it differs from conventional decision forests since we propose a principled, joint and global optimization of split and leaf node parameters. We show experimental results on benchmark machine learning datasets like MNIST and ImageNet and find on-par or superior results when compared to state-of-the-art deep models. Most remarkably, we obtain Top5-Errors of only 7.84%/6.38% on ImageNet validation data when integrating our forests in a single-crop, single/seven model GoogLeNet architecture, respectively. Thus, even without any form of training data set augmentation we are improving on the 6.67% error obtained by the best GoogLeNet architecture (7 models, 144 crops).","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"1467-1475"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79633308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 458
Interpolation on the Manifold of K Component GMMs K分量gmm流形上的插值
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.330
Hyunwoo J. Kim, N. Adluru, Monami Banerjee, B. Vemuri, Vikas Singh
Probability density functions (PDFs) are fundamental "objects" in mathematics with numerous applications in computer vision, machine learning and medical imaging. The feasibility of basic operations such as computing the distance between two PDFs and estimating a mean of a set of PDFs is a direct function of the representation we choose to work with. In this paper, we study the Gaussian mixture model (GMM) representation of the PDFs motivated by its numerous attractive features. (1) GMMs are arguably more interpretable than, say, square root parameterizations (2) the model complexity can be explicitly controlled by the number of components and (3) they are already widely used in many applications. The main contributions of this paper are numerical algorithms to enable basic operations on such objects that strictly respect their underlying geometry. For instance, when operating with a set of k component GMMs, a first order expectation is that the result of simple operations like interpolation and averaging should provide an object that is also a k component GMM. The literature provides very little guidance on enforcing such requirements systematically. It turns out that these tasks are important internal modules for analysis and processing of a field of ensemble average propagators (EAPs), common in diffusion weighted magnetic resonance imaging. We provide proof of principle experiments showing how the proposed algorithms for interpolation can facilitate statistical analysis of such data, essential to many neuroimaging studies. Separately, we also derive interesting connections of our algorithm with functional spaces of Gaussians, that may be of independent interest.
概率密度函数(pdf)是数学中的基本“对象”,在计算机视觉、机器学习和医学成像中有许多应用。计算两个pdf之间的距离和估计一组pdf的平均值等基本操作的可行性是我们选择使用的表示的直接函数。在本文中,我们研究了高斯混合模型(GMM)表示pdf文件的许多吸引人的特征。(1) gmm可以说比平方根参数化更具可解释性(2)模型复杂性可以通过组件的数量显式地控制(3)它们已经在许多应用程序中广泛使用。本文的主要贡献是数值算法,使这些对象的基本操作严格遵守其底层几何。例如,当对一组k分量GMM进行操作时,一阶期望是插值和平均等简单操作的结果应该提供一个同样是k分量GMM的对象。文献对系统地执行这些要求提供了很少的指导。结果表明,这些任务是扩散加权磁共振成像中常见的系综平均传播子(EAPs)场分析和处理的重要内部模块。我们提供原理实验证明,表明所提出的插值算法如何促进此类数据的统计分析,这对许多神经影像学研究至关重要。另外,我们还推导了我们的算法与高斯函数空间的有趣联系,这可能是独立的兴趣。
{"title":"Interpolation on the Manifold of K Component GMMs","authors":"Hyunwoo J. Kim, N. Adluru, Monami Banerjee, B. Vemuri, Vikas Singh","doi":"10.1109/ICCV.2015.330","DOIUrl":"https://doi.org/10.1109/ICCV.2015.330","url":null,"abstract":"Probability density functions (PDFs) are fundamental \"objects\" in mathematics with numerous applications in computer vision, machine learning and medical imaging. The feasibility of basic operations such as computing the distance between two PDFs and estimating a mean of a set of PDFs is a direct function of the representation we choose to work with. In this paper, we study the Gaussian mixture model (GMM) representation of the PDFs motivated by its numerous attractive features. (1) GMMs are arguably more interpretable than, say, square root parameterizations (2) the model complexity can be explicitly controlled by the number of components and (3) they are already widely used in many applications. The main contributions of this paper are numerical algorithms to enable basic operations on such objects that strictly respect their underlying geometry. For instance, when operating with a set of k component GMMs, a first order expectation is that the result of simple operations like interpolation and averaging should provide an object that is also a k component GMM. The literature provides very little guidance on enforcing such requirements systematically. It turns out that these tasks are important internal modules for analysis and processing of a field of ensemble average propagators (EAPs), common in diffusion weighted magnetic resonance imaging. We provide proof of principle experiments showing how the proposed algorithms for interpolation can facilitate statistical analysis of such data, essential to many neuroimaging studies. Separately, we also derive interesting connections of our algorithm with functional spaces of Gaussians, that may be of independent interest.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"222 1","pages":"2884-2892"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83480056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection 学习迁移:潜在任务结构的迁移及其在个体面部动作单元检测中的应用
Pub Date : 2015-12-07 DOI: 10.1109/ICCV.2015.430
Timur R. Almaev, Brais Martínez, M. Valstar
In this article we explore the problem of constructing person-specific models for the detection of facial Action Units (AUs), addressing the problem from the point of view of Transfer Learning and Multi-Task Learning. Our starting point is the fact that some expressions, such as smiles, are very easily elicited, annotated, and automatically detected, while others are much harder to elicit and to annotate. We thus consider a novel problem: all AU models for the target subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU. In order to design such a model, we propose a novel Multi-Task Learning and the associated Transfer Learning framework, in which we consider both relations across subjects and AUs. That is to say, we consider a tensor structure among the tasks. Our approach hinges on learning the latent relations among tasks using one single reference AU, and then transferring these latent relations to other AUs. We show that we are able to effectively make use of the annotated data for AU12 when learning other person-specific AU models, even in the absence of data for the target task. Finally, we show the excellent performance of our method when small amounts of annotated data for the target tasks are made available.
在本文中,我们从迁移学习和多任务学习的角度探讨了构建人脸动作单元(AUs)检测的个体模型的问题。我们的出发点是这样一个事实:一些表情,比如微笑,很容易被引出、标注和自动检测,而另一些表情则很难被引出和标注。因此,我们考虑了一个新的问题:所有目标受试者的AU模型都是使用参考AU(在我们的案例中是AU12)的个人特定注释数据来学习的,而没有或很少有关于目标AU的数据。为了设计这样一个模型,我们提出了一个新的多任务学习和相关的迁移学习框架,在这个框架中我们同时考虑了学科和单位之间的关系。也就是说,我们在任务中考虑一个张量结构。我们的方法依赖于使用单个参考AU学习任务之间的潜在关系,然后将这些潜在关系转移到其他AU。我们表明,即使在缺乏目标任务的数据的情况下,我们也能够在学习其他特定于个人的AU模型时有效地利用AU12的注释数据。最后,我们展示了在为目标任务提供少量带注释的数据时,我们的方法的出色性能。
{"title":"Learning to Transfer: Transferring Latent Task Structures and Its Application to Person-Specific Facial Action Unit Detection","authors":"Timur R. Almaev, Brais Martínez, M. Valstar","doi":"10.1109/ICCV.2015.430","DOIUrl":"https://doi.org/10.1109/ICCV.2015.430","url":null,"abstract":"In this article we explore the problem of constructing person-specific models for the detection of facial Action Units (AUs), addressing the problem from the point of view of Transfer Learning and Multi-Task Learning. Our starting point is the fact that some expressions, such as smiles, are very easily elicited, annotated, and automatically detected, while others are much harder to elicit and to annotate. We thus consider a novel problem: all AU models for the target subject are to be learnt using person-specific annotated data for a reference AU (AU12 in our case), and no data or little data regarding the target AU. In order to design such a model, we propose a novel Multi-Task Learning and the associated Transfer Learning framework, in which we consider both relations across subjects and AUs. That is to say, we consider a tensor structure among the tasks. Our approach hinges on learning the latent relations among tasks using one single reference AU, and then transferring these latent relations to other AUs. We show that we are able to effectively make use of the annotated data for AU12 when learning other person-specific AU models, even in the absence of data for the target task. Finally, we show the excellent performance of our method when small amounts of annotated data for the target tasks are made available.","PeriodicalId":6633,"journal":{"name":"2015 IEEE International Conference on Computer Vision (ICCV)","volume":"19 2 1","pages":"3774-3782"},"PeriodicalIF":0.0,"publicationDate":"2015-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83553064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
期刊
2015 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1