2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops最新文献

英文中文

Using closed captions to train activity recognizers that improve video retrieval 使用封闭字幕训练活动识别器，提高视频检索

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204202

S. Gupta, R. Mooney

Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, rapid camera movements etc. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting `labeled' data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an on-going activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.

在现实世界的视频中识别活动是一个困难的问题，由于背景混乱、摄像机角度和变焦的变化、摄像机快速移动等原因而加剧。大型标记视频语料库可用于训练自动活动识别系统，但这需要昂贵的人力和时间。本文探讨了许多视频自然伴随的封闭字幕如何作为弱监督，允许自动收集“标记”数据以进行活动识别。我们证明了这种方法可以提高足球视频的活动检索。我们的系统不需要对视频片段进行人工标记，只需要最少的人工监督。我们还提出了一种新的标题分类器，它使用额外的语言信息来确定特定评论是否指的是正在进行的活动。我们证明了语言分析和自动训练的活动识别器相结合可以显著提高视频检索的精度。

引用次数: 16

An implicit spatiotemporal shape model for human activity localization and recognition 人类活动定位与识别的隐式时空形状模型

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204262

A. Oikonomopoulos, I. Patras, M. Pantic

In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, `visual words' and `visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.

在本文中，我们解决了未分割图像序列中人类活动的定位和识别问题。所提出的方法的主要贡献是使用活动的时空形状的隐式表示，它依赖于特征，稀疏，“视觉词”和“视觉动词”的时空定位。在概率时空投票方案中积累了活动时空定位的证据。我们的投票框架的本地特性允许我们恢复发生在同一场景中的多个活动，以及存在混乱和闭塞的活动。我们使用训练集中的描述符构建特定于类的码本，其中我们考虑了码字对的空间共现。码字对相对于对象中心的位置，以及它们在训练集中发生的帧随后被存储，以便创建码字共现的时空模型。在测试阶段，我们使用均值移位模式估计来对每一帧中执行活动的主体进行空间分割，并使用Radon变换来提取关于连续流中活动的时间分割的最可能假设。

{"title":"An implicit spatiotemporal shape model for human activity localization and recognition","authors":"A. Oikonomopoulos, I. Patras, M. Pantic","doi":"10.1109/CVPRW.2009.5204262","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204262","url":null,"abstract":"In this paper we address the problem of localisation and recognition of human activities in unsegmented image sequences. The main contribution of the proposed method is the use of an implicit representation of the spatiotemporal shape of the activity which relies on the spatiotemporal localization of characteristic, sparse, `visual words' and `visual verbs'. Evidence for the spatiotemporal localization of the activity are accumulated in a probabilistic spatiotemporal voting scheme. The local nature of our voting framework allows us to recover multiple activities that take place in the same scene, as well as activities in the presence of clutter and occlusions. We construct class-specific codebooks using the descriptors in the training set, where we take the spatial co-occurrences of pairs of codewords into account. The positions of the codeword pairs with respect to the object centre, as well as the frame in the training set in which they occur are subsequently stored in order to create a spatiotemporal model of codeword co-occurrences. During the testing phase, we use mean shift mode estimation in order to spatially segment the subject that performs the activities in every frame, and the Radon transform in order to extract the most probable hypotheses concerning the temporal segmentation of the activities within the continuous stream.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133820322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

GPU-accelerated, gradient-free MI deformable registration for atlas-based MR brain image segmentation 基于阿特拉斯的磁共振脑图像分割的gpu加速，无梯度MI可变形配准

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204043

Xiao Han, L. Hibbard, V. Willcut

Brain structure segmentation is an important task in many neuroscience and clinical applications. In this paper, we introduce a novel MI-based dense deformable registration method and apply it to the automatic segmentation of detailed brain structures. Together with a multiple atlas fusion strategy, very accurate segmentation results were obtained, as compared with other reported methods in the literature. To make multi-atlas segmentation computationally feasible, we also propose to take advantage of the recent advancements in GPU technology and introduce a GPU-based implementation of the proposed registration method. With GPU acceleration it takes less than 8 minutes to compile a multi-atlas segmentation for each subject even with as many as 17 atlases, which demonstrates that the use of GPUs can greatly facilitate the application of such atlas-based segmentation methods in practice.

脑结构分割是许多神经科学和临床应用中的重要任务。本文提出了一种新颖的基于mi的密集形变配准方法，并将其应用于脑结构细节的自动分割。与文献中报道的其他方法相比，结合多图谱融合策略，获得了非常准确的分割结果。为了使多图谱分割在计算上可行，我们还建议利用GPU技术的最新进展，并引入基于GPU的实现所提出的配准方法。在GPU加速的情况下，即使多达17个地图集，也可以在不到8分钟的时间内编译出每个主题的多地图集分割，这表明使用GPU可以极大地促进这种基于地图集的分割方法在实践中的应用。

引用次数: 33

Is there a general structure for grammars? 语法有一个通用的结构吗?

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204334

D. Mumford

Summary form only given. Linguists have proposed dozens of formalisms for grammars and now vision is weighing in with its versions based on its needs. Ulf Grenander has proposed general pattern theory, and has used grammar-like graphical parses of "thoughts" in the style of AI. One wants a natural, simple formalism treating all these cases. I want to pose this as a central problem in modeling intelligence. Pattern theory started in the 70's with the ideas of Ulf Grenander and his school at Brown. The aim is to analyze from a statistical point of view the patterns in all "signals" generated by the world, whether they be images, sounds, written text, DNA or protein strings, spike trains in neurons, time series of prices or weather, etc. Pattern theory proposes that the types of patterns-and the hidden variables needed to describe these patterns - found in one class of signals will often be found in the others and that their characteristic variability will be similar. The underlying idea is to find classes of stochastic models which can capture all the patterns that we see in nature, so that random samples from these models have the same "look and feel" as the samples from the world itself. Then the detection of patterns in noisy and ambiguous samples can be achieved by the use of Bayes' rule, a method that can be described as "analysis by synthesis".

只提供摘要形式。语言学家已经提出了几十种语法形式，现在vision正在根据自己的需要推出不同的版本。Ulf Grenander提出了通用模式理论，并以人工智能的方式使用类似语法的图形化“思想”解析。人们需要一种自然的，简单的形式主义来处理所有这些情况。我想把它作为智能建模的一个核心问题。模式理论始于70年代，由Ulf Grenander和他在布朗大学的学派提出。其目的是从统计学的角度分析世界产生的所有“信号”的模式，无论这些信号是图像、声音、书面文本、DNA或蛋白质串、神经元的尖峰序列、价格或天气的时间序列等等。模式理论提出，在一类信号中发现的模式类型——以及描述这些模式所需的隐藏变量——通常会在其他信号中发现，并且它们的特征可变性将是相似的。潜在的想法是找到能够捕获我们在自然界中看到的所有模式的随机模型类别，因此这些模型中的随机样本与来自世界本身的样本具有相同的“外观和感觉”。然后利用贝叶斯规则实现对噪声和模糊样本的模式检测，这种方法可以被描述为“综合分析”。

{"title":"Is there a general structure for grammars?","authors":"D. Mumford","doi":"10.1109/CVPRW.2009.5204334","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204334","url":null,"abstract":"Summary form only given. Linguists have proposed dozens of formalisms for grammars and now vision is weighing in with its versions based on its needs. Ulf Grenander has proposed general pattern theory, and has used grammar-like graphical parses of \"thoughts\" in the style of AI. One wants a natural, simple formalism treating all these cases. I want to pose this as a central problem in modeling intelligence. Pattern theory started in the 70's with the ideas of Ulf Grenander and his school at Brown. The aim is to analyze from a statistical point of view the patterns in all \"signals\" generated by the world, whether they be images, sounds, written text, DNA or protein strings, spike trains in neurons, time series of prices or weather, etc. Pattern theory proposes that the types of patterns-and the hidden variables needed to describe these patterns - found in one class of signals will often be found in the others and that their characteristic variability will be similar. The underlying idea is to find classes of stochastic models which can capture all the patterns that we see in nature, so that random samples from these models have the same \"look and feel\" as the samples from the world itself. Then the detection of patterns in noisy and ambiguous samples can be achieved by the use of Bayes' rule, a method that can be described as \"analysis by synthesis\".","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123248761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3D stochastic completion fields for fiber tractography 纤维束成像的三维随机完井场

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204044

P. MomayyezSiahkal, Kaleem Siddiqi

We approach the problem of fiber tractography from the viewpoint that a computational theory should relate to the underlying quantity that is being measured - the diffusion of water molecules. We characterize the Brownian motion of water by a 3D random walk described by a stochastic non-linear differential equation. We show that the maximum-likelihood trajectories are 3D elastica, or curves of least energy. We illustrate the model with Monte-Carlo (sequential) simulations and then develop a more efficient (local, parallelizable) implementation, based on the Fokker-Planck equation. The final algorithm allows us to efficiently compute stochastic completion fields to connect a source region to a sink region, while taking into account the underlying diffusion MRI data. We demonstrate promising tractography results using high angular resolution diffusion data as input.

我们从计算理论应该与被测量的潜在量——水分子的扩散——相关的观点来处理纤维束图的问题。我们用随机非线性微分方程描述的三维随机游走来表征水的布朗运动。我们表明，最大似然轨迹是三维弹性的，或能量最小的曲线。我们用蒙特卡罗(顺序)模拟说明了该模型，然后基于Fokker-Planck方程开发了一个更有效的(局部的，可并行化的)实现。最后的算法允许我们有效地计算随机补全场，将源区域连接到汇聚区域，同时考虑到潜在的扩散MRI数据。我们展示了有希望的牵引成像结果使用高角分辨率的扩散数据作为输入。

引用次数: 10

Nonparametric bottom-up saliency detection by self-resemblance 基于自相似的非参数自底向上显著性检测

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204207

H. Seo, P. Milanfar

We present a novel bottom-up saliency detection algorithm. Our method computes so-called local regression kernels (i.e., local features) from the given image, which measure the likeness of a pixel to its surroundings. Visual saliency is then computed using the said “self-resemblance” measure. The framework results in a saliency map where each pixel indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices. As a similarity measure, matrix cosine similarity (a generalization of cosine similarity) is employed. State of the art performance is demonstrated on commonly used human eye fixation data [3] and some psychological patterns.

提出了一种新颖的自下而上显著性检测算法。我们的方法从给定的图像中计算所谓的局部回归核(即局部特征)，它测量像素与其周围环境的相似性。然后使用上述“自相似”测量来计算视觉显著性。该框架生成显著性图，其中每个像素表示给定其周围特征矩阵的特征矩阵显著性的统计可能性。采用矩阵余弦相似度(余弦相似度的一种推广)作为相似度度量。在常用的人眼注视数据[3]和一些心理模式上展示了最新的表现。

引用次数: 116

Robust feature matching in 2.3µs 在2.3µs内实现鲁棒特征匹配

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204314

S. Taylor, E. Rosten, T. Drummond

In this paper we present a robust feature matching scheme in which features can be matched in 2.3µs. For a typical task involving 150 features per image, this results in a processing time of 500µs for feature extraction and matching. In order to achieve very fast matching we use simple features based on histograms of pixel intensities and an indexing scheme based on their joint distribution. The features are stored with a novel bit mask representation which requires only 44 bytes of memory per feature and allows computation of a dissimilarity score in 20ns. A training phase gives the patch-based features invariance to small viewpoint variations. Larger viewpoint variations are handled by training entirely independent sets of features from different viewpoints. A complete system is presented where a database of around 13,000 features is used to robustly localise a single planar target in just over a millisecond, including all steps from feature detection to model fitting. The resulting system shows comparable robustness to SIFT [8] and Ferns [14] while using a tiny fraction of the processing time, and in the latter case a fraction of the memory as well.

在本文中，我们提出了一种鲁棒的特征匹配方案，该方案可以在2.3µs内匹配特征。对于每张图像涉及150个特征的典型任务，这导致特征提取和匹配的处理时间为500µs。为了实现快速匹配，我们使用基于像素强度直方图的简单特征和基于它们的联合分布的索引方案。特征以一种新颖的位掩码表示方式存储，每个特征只需要44字节的内存，并允许在20ns内计算不同的分数。训练阶段使基于补丁的特征对小的视点变化具有不变性。通过训练来自不同视点的完全独立的特征集来处理较大的视点变化。提出了一个完整的系统，其中使用大约13,000个特征数据库在一毫秒多一点的时间内对单个平面目标进行鲁棒定位，包括从特征检测到模型拟合的所有步骤。所得到的系统显示出与SIFT[8]和Ferns[14]相当的鲁棒性，同时使用很小一部分的处理时间，在后者的情况下也使用一小部分的内存。

{"title":"Robust feature matching in 2.3µs","authors":"S. Taylor, E. Rosten, T. Drummond","doi":"10.1109/CVPRW.2009.5204314","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204314","url":null,"abstract":"In this paper we present a robust feature matching scheme in which features can be matched in 2.3µs. For a typical task involving 150 features per image, this results in a processing time of 500µs for feature extraction and matching. In order to achieve very fast matching we use simple features based on histograms of pixel intensities and an indexing scheme based on their joint distribution. The features are stored with a novel bit mask representation which requires only 44 bytes of memory per feature and allows computation of a dissimilarity score in 20ns. A training phase gives the patch-based features invariance to small viewpoint variations. Larger viewpoint variations are handled by training entirely independent sets of features from different viewpoints. A complete system is presented where a database of around 13,000 features is used to robustly localise a single planar target in just over a millisecond, including all steps from feature detection to model fitting. The resulting system shows comparable robustness to SIFT [8] and Ferns [14] while using a tiny fraction of the processing time, and in the latter case a fraction of the memory as well.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125116364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78

A method for selecting and ranking quality metrics for optimization of biometric recognition systems 一种用于优化生物特征识别系统的质量度量的选择和排序方法

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204309

N. Schmid, Francesco Nicolo

In the field of biometrics evaluation of quality of biometric samples has a number of important applications. The main applications include (1) to reject poor quality images during acquisition, (2) to use as enhancement metric, and (3) to apply as a weighting factor in fusion schemes. Since a biometric-based recognition system relies on measures of performance such as matching scores and recognition probability of error, it becomes intuitive that the metrics evaluating biometric sample quality have to be linked to the recognition performance of the system. The goal of this work is to design a method for evaluating and ranking various quality metrics applied to biometric images or signals based on their ability to predict recognition performance of a biometric recognition system. The proposed method involves: (1) Preprocessing algorithm operating on pairs of quality scores and generating relative scores, (2) Adaptive multivariate mapping relating quality scores and measures of recognition performance and (3) Ranking algorithm that selects the best combinations of quality measures. The performance of the method is demonstrated on face and iris biometric data.

在生物识别领域中，生物识别样品的质量评价有着许多重要的应用。主要应用包括(1)在采集过程中剔除质量差的图像，(2)用作增强度量，以及(3)作为融合方案中的加权因子。由于基于生物特征的识别系统依赖于诸如匹配分数和识别错误概率等性能度量，因此评估生物特征样本质量的度量必须与系统的识别性能联系起来，这变得很直观。这项工作的目标是设计一种方法，根据生物特征图像或信号预测识别性能的能力，对应用于生物特征图像或信号的各种质量指标进行评估和排序。该方法包括:(1)对质量分数对进行预处理并生成相对分数的算法;(2)质量分数与识别性能指标之间的自适应多变量映射;(3)选择最佳质量指标组合的排序算法。在人脸和虹膜生物特征数据上验证了该方法的有效性。

{"title":"A method for selecting and ranking quality metrics for optimization of biometric recognition systems","authors":"N. Schmid, Francesco Nicolo","doi":"10.1109/CVPRW.2009.5204309","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204309","url":null,"abstract":"In the field of biometrics evaluation of quality of biometric samples has a number of important applications. The main applications include (1) to reject poor quality images during acquisition, (2) to use as enhancement metric, and (3) to apply as a weighting factor in fusion schemes. Since a biometric-based recognition system relies on measures of performance such as matching scores and recognition probability of error, it becomes intuitive that the metrics evaluating biometric sample quality have to be linked to the recognition performance of the system. The goal of this work is to design a method for evaluating and ranking various quality metrics applied to biometric images or signals based on their ability to predict recognition performance of a biometric recognition system. The proposed method involves: (1) Preprocessing algorithm operating on pairs of quality scores and generating relative scores, (2) Adaptive multivariate mapping relating quality scores and measures of recognition performance and (3) Ranking algorithm that selects the best combinations of quality measures. The performance of the method is demonstrated on face and iris biometric data.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128244961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Multiple label prediction for image annotation with multiple Kernel correlation models 基于多核相关模型的图像标注多标签预测

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204274

Oksana Yakhnenko, Vasant G Honavar

Image annotation is a challenging task that allows to correlate text keywords with an image. In this paper we address the problem of image annotation using Kernel Multiple Linear Regression model. Multiple Linear Regression (MLR) model reconstructs image caption from an image by performing a linear transformation of an image into some semantic space, and then recovers the caption by performing another linear transformation from the semantic space into the label space. The model is trained so that model parameters minimize the error of reconstruction directly. This model is related to Canonical Correlation Analysis (CCA) which maps both images and caption into the semantic space to minimize the distance of mapping in the semantic space. Kernel trick is then used for the MLR resulting in Kernel Multiple Linear Regression model. The solution to KMLR is a solution to the generalized eigen-value problem, related to KCCA (Kernel Canonical Correlation Analysis). We then extend Kernel Multiple Linear Regression and Kernel Canonical Correlation analysis models to multiple kernel setting, to allow various representations of images and captions. We present results for image annotation using Multiple Kernel Learning CCA and MLR on Oliva and Torralba (2001) scene recognition that show kernel selection behaviour.

图像注释是一项具有挑战性的任务，它允许将文本关键字与图像关联起来。本文利用核多元线性回归模型解决了图像标注问题。多元线性回归(Multiple Linear Regression, MLR)模型通过对图像进行某种语义空间的线性变换来重建图像标题，然后对图像进行另一次从语义空间到标签空间的线性变换来恢复图像标题。对模型进行训练，使模型参数直接减小重构误差。该模型与典型相关分析(CCA)有关，典型相关分析将图像和标题都映射到语义空间中，以最小化语义空间中的映射距离。然后将核技巧用于MLR，从而得到核多元线性回归模型。KMLR的解是广义特征值问题的解，与核典型相关分析(KCCA)有关。然后，我们将核多元线性回归和核典型相关分析模型扩展到多个核设置，以允许图像和标题的各种表示。我们展示了在Oliva和Torralba(2001)场景识别上使用多核学习CCA和MLR进行图像注释的结果，这些结果显示了核选择行为。

{"title":"Multiple label prediction for image annotation with multiple Kernel correlation models","authors":"Oksana Yakhnenko, Vasant G Honavar","doi":"10.1109/CVPRW.2009.5204274","DOIUrl":"https://doi.org/10.1109/CVPRW.2009.5204274","url":null,"abstract":"Image annotation is a challenging task that allows to correlate text keywords with an image. In this paper we address the problem of image annotation using Kernel Multiple Linear Regression model. Multiple Linear Regression (MLR) model reconstructs image caption from an image by performing a linear transformation of an image into some semantic space, and then recovers the caption by performing another linear transformation from the semantic space into the label space. The model is trained so that model parameters minimize the error of reconstruction directly. This model is related to Canonical Correlation Analysis (CCA) which maps both images and caption into the semantic space to minimize the distance of mapping in the semantic space. Kernel trick is then used for the MLR resulting in Kernel Multiple Linear Regression model. The solution to KMLR is a solution to the generalized eigen-value problem, related to KCCA (Kernel Canonical Correlation Analysis). We then extend Kernel Multiple Linear Regression and Kernel Canonical Correlation analysis models to multiple kernel setting, to allow various representations of images and captions. We present results for image annotation using Multiple Kernel Learning CCA and MLR on Oliva and Torralba (2001) scene recognition that show kernel selection behaviour.","PeriodicalId":431981,"journal":{"name":"2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130887149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

On conversion from color to gray-scale images for face detection 用于人脸检测的彩色图像到灰度图像的转换

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

Pub Date : 2009-06-20 DOI: 10.1109/CVPRW.2009.5204297

Juwei Lu, K. Plataniotis

The paper presents a study on color to gray image conversion from a novel point of view: face detection. To the best knowledge of the authors, research in such a specific topic has not been conducted before. Our work reveals that the standard NTSC conversion is not optimal for face detection tasks, although it may be the best for use to display pictures on monochrome televisions. It is further found experimentally with two AdaBoost-based face detection systems that the detect rates may vary up to 10% by simply changing the parameters of the RGB to Gray conversion. On the other hand, the change has little influence on the false positive rates. Compared to the standard NTSC conversion, the detect rate with the best found parameter setting is 2.85% and 3.58% higher for the two evaluated face detection systems. Promisingly, the work suggests a new solution to the color to gray conversion. It could be extremely easy to be incorporated into most existing face detection systems for accuracy improvement without introduction of any extra cost in computational complexity.

本文从一个新的角度——人脸检测——研究了彩色图像到灰度图像的转换。据作者所知，在这样一个特定主题的研究之前还没有进行过。我们的研究表明，标准的NTSC转换并不是人脸检测任务的最佳选择，尽管它可能是单色电视上显示图片的最佳选择。通过两个基于adaboost的人脸检测系统的实验进一步发现，通过简单地改变RGB到Gray转换的参数，检测率可以变化高达10%。另一方面，这种变化对假阳性率的影响很小。与标准NTSC转换相比，两种被评估的人脸检测系统在最佳发现参数设置下的检测率分别高出2.85%和3.58%。有希望的是，这项工作提出了一种新的解决方案，以颜色到灰色的转换。它可以非常容易地整合到大多数现有的人脸检测系统中，以提高准确性，而不会引入任何额外的计算复杂性成本。

引用次数: 29

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀