首页 > 最新文献

2009 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
Shape discovery from unlabeled image collections 从未标记的图像集合中发现形状
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206698
Yong Jae Lee, K. Grauman
Can we discover common object shapes within unlabeled multi-category collections of images? While often a critical cue at the category-level, contour matches can be difficult to isolate reliably from edge clutter-even within labeled images from a known class, let alone unlabeled examples. We propose a shape discovery method in which local appearance (patch) matches serve to anchor the surrounding edge fragments, yielding a more reliable affinity function for images that accounts for both shape and appearance. Spectral clustering from the initial affinities provides candidate object clusters. Then, we compute the within-cluster match patterns to discern foreground edges from clutter, attributing higher weight to edges more likely to belong to a common object. In addition to discovering the object contours in each image, we show how to summarize what is found with prototypical shapes. Our results on benchmark datasets demonstrate the approach can successfully discover shapes from unlabeled images.
我们能在未标记的多类别图像集合中发现常见的物体形状吗?虽然轮廓匹配通常是类别级别的关键线索,但很难从边缘杂乱中可靠地分离出来——即使是在已知类别的标记图像中,更不用说未标记的示例了。我们提出了一种形状发现方法,其中局部外观(patch)匹配用于锚定周围边缘碎片,为同时考虑形状和外观的图像产生更可靠的亲和力函数。光谱聚类从初始亲和力提供候选对象聚类。然后,我们计算聚类内匹配模式以从杂波中识别前景边缘,并赋予更可能属于共同目标的边缘更高的权重。除了发现每个图像中的对象轮廓外,我们还展示了如何总结与原型形状发现的内容。我们在基准数据集上的结果表明,该方法可以成功地从未标记的图像中发现形状。
{"title":"Shape discovery from unlabeled image collections","authors":"Yong Jae Lee, K. Grauman","doi":"10.1109/CVPR.2009.5206698","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206698","url":null,"abstract":"Can we discover common object shapes within unlabeled multi-category collections of images? While often a critical cue at the category-level, contour matches can be difficult to isolate reliably from edge clutter-even within labeled images from a known class, let alone unlabeled examples. We propose a shape discovery method in which local appearance (patch) matches serve to anchor the surrounding edge fragments, yielding a more reliable affinity function for images that accounts for both shape and appearance. Spectral clustering from the initial affinities provides candidate object clusters. Then, we compute the within-cluster match patterns to discern foreground edges from clutter, attributing higher weight to edges more likely to belong to a common object. In addition to discovering the object contours in each image, we show how to summarize what is found with prototypical shapes. Our results on benchmark datasets demonstrate the approach can successfully discover shapes from unlabeled images.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133450344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Linear spatial pyramid matching using sparse coding for image classification 基于稀疏编码的线性空间金字塔匹配图像分类
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206757
Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang
Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n2 ~ n3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
近年来,基于空间金字塔匹配核的支持向量机在图像分类中取得了很大的成功。尽管这些非线性支持向量机很流行,但其训练复杂度为O(n2 ~ n3),测试复杂度为O(n),其中n为训练大小,这意味着将算法扩展到处理数千个以上的训练图像是不平凡的。本文对SPM方法进行了扩展,将向量量化推广到稀疏编码,然后进行多尺度空间最大池化,提出了基于SIFT稀疏编码的线性SPM核。这种新方法显著地将支持向量机的复杂度降低到训练时的0 (n)和测试时的一个常数。在大量的图像分类实验中,我们发现,在分类精度方面,基于SIFT描述子稀疏编码的线性SPM在直方图上的表现总是显著优于线性SPM核,甚至优于非线性SPM核,从而在使用单一类型描述子的几个基准测试中取得了最先进的性能。
{"title":"Linear spatial pyramid matching using sparse coding for image classification","authors":"Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang","doi":"10.1109/CVPR.2009.5206757","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206757","url":null,"abstract":"Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n2 ~ n3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3136
Stereographic rectification of omnidirectional stereo pairs 全向立体对的立体校正
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206530
Jan Heller, T. Pajdla
We present a general technique for rectification of a stereo pair acquired by a calibrated omnidirectional camera. Using this technique we formulate a new stereographic rectification method. Our rectification does not map epipolar curves onto lines as common rectification methods, but rather maps epipolar curves onto circles. We show that this rectification in a certain sense minimizes the distortion of the original omnidirectional images. We formulate the rectification for multiple images and show that the choice of the optimal projection center of the rectification is under certain circumstances equivalent to the classical problem of spherical minimax location. We demonstrate the behaviour and the quality of the rectification in real experiments with images from 180 degree field of view fish eye lenses.
我们提出了一种校正由校准的全向相机获得的立体对的一般技术。利用这一技术,我们提出了一种新的立体校正方法。我们的校正不是像一般的校正方法那样将极面曲线映射到直线上,而是将极面曲线映射到圆上。我们表明,这种校正在一定意义上最小化了原始全向图像的失真。提出了多幅图像的校正问题,并证明了在一定条件下,校正最优投影中心的选择等价于经典的球面极大极小定位问题。我们在真实实验中用180度视场鱼眼透镜的图像演示了校正的行为和质量。
{"title":"Stereographic rectification of omnidirectional stereo pairs","authors":"Jan Heller, T. Pajdla","doi":"10.1109/CVPR.2009.5206530","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206530","url":null,"abstract":"We present a general technique for rectification of a stereo pair acquired by a calibrated omnidirectional camera. Using this technique we formulate a new stereographic rectification method. Our rectification does not map epipolar curves onto lines as common rectification methods, but rather maps epipolar curves onto circles. We show that this rectification in a certain sense minimizes the distortion of the original omnidirectional images. We formulate the rectification for multiple images and show that the choice of the optimal projection center of the rectification is under certain circumstances equivalent to the classical problem of spherical minimax location. We demonstrate the behaviour and the quality of the rectification in real experiments with images from 180 degree field of view fish eye lenses.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"818 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132640475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Active volume models for 3D medical image segmentation 三维医学图像分割的活动体模型
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206563
Tian Shen, Hongsheng Li, Z. Qian, Xiaolei Huang
In this paper, we propose a novel predictive model for object boundary, which can integrate information from any sources. The model is a dynamic “object” model whose manifestation includes a deformable surface representing shape, a volumetric interior carrying appearance statistics, and an embedded classifier that separates object from background based on current feature information. Unlike Snakes, Level Set, Graph Cut, MRF and CRF approaches, the model is “self-contained” in that it does not model the background, but rather focuses on an accurate representation of the foreground object's attributes. As we will show, however, the model is capable of reasoning about the background statistics thus can detect when is change sufficient to invoke a boundary decision. The shape of the 3D model is considered as an elastic solid, with a simplex-mesh (i.e. finite element triangulation) surface made of thousands of vertices. Deformations of the model are derived from a linear system that encodes external forces from the boundary of a Region of Interest (ROI), which is a binary mask representing the object region predicted by the current model. Efficient optimization and fast convergence of the model are achieved using the Finite Element Method (FEM). Other advantages of the model include the ease of dealing with topology changes and its ability to incorporate human interactions. Segmentation and validation results are presented for experiments on noisy 3D medical images.
本文提出了一种新的目标边界预测模型,该模型可以集成任意来源的信息。该模型是一个动态的“对象”模型,其表现形式包括一个表示形状的可变形表面,一个承载外观统计的体积内部,以及一个基于当前特征信息将对象与背景分离的嵌入式分类器。与snake, Level Set, Graph Cut, MRF和CRF方法不同,该模型是“自包含的”,因为它不模拟背景,而是专注于前景对象属性的准确表示。然而,正如我们将展示的那样,该模型能够对背景统计数据进行推理,因此可以检测到何时变化足以调用边界决策。三维模型的形状被认为是一个弹性实体,具有由数千个顶点组成的简单网格(即有限元三角剖分)表面。模型的变形来源于一个线性系统,该系统对来自感兴趣区域(ROI)边界的外力进行编码,感兴趣区域(ROI)是表示当前模型预测的对象区域的二进制掩码。采用有限元法实现了模型的高效优化和快速收敛。该模型的其他优点包括易于处理拓扑变化,以及能够整合人类交互。给出了对带有噪声的三维医学图像的分割和验证实验结果。
{"title":"Active volume models for 3D medical image segmentation","authors":"Tian Shen, Hongsheng Li, Z. Qian, Xiaolei Huang","doi":"10.1109/CVPR.2009.5206563","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206563","url":null,"abstract":"In this paper, we propose a novel predictive model for object boundary, which can integrate information from any sources. The model is a dynamic “object” model whose manifestation includes a deformable surface representing shape, a volumetric interior carrying appearance statistics, and an embedded classifier that separates object from background based on current feature information. Unlike Snakes, Level Set, Graph Cut, MRF and CRF approaches, the model is “self-contained” in that it does not model the background, but rather focuses on an accurate representation of the foreground object's attributes. As we will show, however, the model is capable of reasoning about the background statistics thus can detect when is change sufficient to invoke a boundary decision. The shape of the 3D model is considered as an elastic solid, with a simplex-mesh (i.e. finite element triangulation) surface made of thousands of vertices. Deformations of the model are derived from a linear system that encodes external forces from the boundary of a Region of Interest (ROI), which is a binary mask representing the object region predicted by the current model. Efficient optimization and fast convergence of the model are achieved using the Finite Element Method (FEM). Other advantages of the model include the ease of dealing with topology changes and its ability to incorporate human interactions. Segmentation and validation results are presented for experiments on noisy 3D medical images.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114521024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A projector-based movable hand-held display system 一种基于投影仪的可移动手持显示系统
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206658
M. Leung, K. Lee, K. Wong, M. Chang
In this paper, we proposed a movable hand-held display system which uses a projector to project display content onto an ordinary cardboard which can move freely within the projection area. Such a system can give users greater freedom of control of the display such as the viewing angle and distance. At the same time, the size of the cardboard can be made to a size that fits one's application. A projector-camera pair is calibrated and used as the tracking and projection system. We present a vision based algorithm to detect an ordinary cardboard and track its subsequent motion. Display content is then pre-warped and projected onto the cardboard at the correct position. Experimental results show that our system can project onto the cardboard in reasonable precision.
在本文中,我们提出了一种可移动的手持显示系统,该系统使用投影仪将显示内容投影到可在投影区域内自由移动的普通纸板上。这样的系统可以给用户更大的自由控制显示,如视角和距离。与此同时,纸板的大小可以被制成适合一个人的应用的大小。一个投影仪-摄像机对被校准并用作跟踪和投影系统。提出了一种基于视觉的检测普通纸板并跟踪其后续运动的算法。然后将显示内容预翘曲并投影到正确位置的纸板上。实验结果表明,该系统能够以合理的精度投影到纸板上。
{"title":"A projector-based movable hand-held display system","authors":"M. Leung, K. Lee, K. Wong, M. Chang","doi":"10.1109/CVPR.2009.5206658","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206658","url":null,"abstract":"In this paper, we proposed a movable hand-held display system which uses a projector to project display content onto an ordinary cardboard which can move freely within the projection area. Such a system can give users greater freedom of control of the display such as the viewing angle and distance. At the same time, the size of the cardboard can be made to a size that fits one's application. A projector-camera pair is calibrated and used as the tracking and projection system. We present a vision based algorithm to detect an ordinary cardboard and track its subsequent motion. Display content is then pre-warped and projected onto the cardboard at the correct position. Experimental results show that our system can project onto the cardboard in reasonable precision.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121100659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Discriminatively trained particle filters for complex multi-object tracking 用于复杂多目标跟踪的判别训练粒子滤波器
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206801
Robin Hess, Alan Fern
This work presents a discriminative training method for particle filters in the context of multi-object tracking. We are motivated by the difficulty of hand-tuning the many model parameters for such applications and also by results in many application domains indicating that discriminative training is often superior to generative training methods. Our learning approach is tightly integrated into the actual inference process of the filter and attempts to directly optimize the filter parameters in response to observed errors. We present experimental results in the challenging domain of American football where our filter is trained to track all 22 players throughout football plays. The training method is shown to significantly improve performance of the tracker and to significantly outperform two recent particle-based multi-object tracking methods.
提出了一种基于多目标跟踪的粒子滤波器判别训练方法。我们的动机是手工调整这些应用程序的许多模型参数的困难,以及许多应用领域的结果表明,判别训练通常优于生成训练方法。我们的学习方法与滤波器的实际推理过程紧密结合,并尝试根据观察到的误差直接优化滤波器参数。我们在具有挑战性的橄榄球领域展示了实验结果,我们的过滤器被训练成在整个橄榄球比赛中跟踪所有22名球员。结果表明,该训练方法显著提高了跟踪器的性能,显著优于两种基于粒子的多目标跟踪方法。
{"title":"Discriminatively trained particle filters for complex multi-object tracking","authors":"Robin Hess, Alan Fern","doi":"10.1109/CVPR.2009.5206801","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206801","url":null,"abstract":"This work presents a discriminative training method for particle filters in the context of multi-object tracking. We are motivated by the difficulty of hand-tuning the many model parameters for such applications and also by results in many application domains indicating that discriminative training is often superior to generative training methods. Our learning approach is tightly integrated into the actual inference process of the filter and attempts to directly optimize the filter parameters in response to observed errors. We present experimental results in the challenging domain of American football where our filter is trained to track all 22 players throughout football plays. The training method is shown to significantly improve performance of the tracker and to significantly outperform two recent particle-based multi-object tracking methods.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116079486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Image categorization by learning with context and consistency 基于上下文和一致性学习的图像分类
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206851
Zhiwu Lu, H. Ip
This paper presents a novel semi-supervised learning method which can make use of intra-image semantic context and inter-image cluster consistency for image categorization with less labeled data. The image representation is first formed with the visual keywords generated by clustering all the blocks that we divide images into. The 2D spatial Markov chain model is then proposed to capture the semantic context across these keywords within an image. To develop a graph-based semi-supervised learning approach to image categorization, we incorporate the intra-image semantic context into a kind of spatial Markov kernel which can be used as the affinity matrix of a graph. Instead of constructing a complete graph, we resort to a k-nearest neighbor graph for label propagation with cluster consistency. To the best of our knowledge, this is the first application of kernel methods and 2D Markov models simultaneously to image categorization. Experiments on the Corel and histological image databases demonstrate that the proposed method can achieve superior results.
本文提出了一种新的半监督学习方法,利用图像内语义上下文和图像间聚类一致性对标记数据较少的图像进行分类。图像表示首先由我们将图像划分的所有块聚类生成的视觉关键字组成。然后提出二维空间马尔可夫链模型来捕获图像中这些关键字的语义上下文。为了开发一种基于图的半监督学习图像分类方法,我们将图像内部语义上下文合并到一种空间马尔可夫核中,该核可以用作图的亲和矩阵。我们没有构造完全图,而是使用k近邻图进行具有聚类一致性的标签传播。据我们所知,这是核方法和二维马尔可夫模型首次同时应用于图像分类。在Corel和组织学图像数据库上的实验表明,该方法可以取得较好的效果。
{"title":"Image categorization by learning with context and consistency","authors":"Zhiwu Lu, H. Ip","doi":"10.1109/CVPR.2009.5206851","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206851","url":null,"abstract":"This paper presents a novel semi-supervised learning method which can make use of intra-image semantic context and inter-image cluster consistency for image categorization with less labeled data. The image representation is first formed with the visual keywords generated by clustering all the blocks that we divide images into. The 2D spatial Markov chain model is then proposed to capture the semantic context across these keywords within an image. To develop a graph-based semi-supervised learning approach to image categorization, we incorporate the intra-image semantic context into a kind of spatial Markov kernel which can be used as the affinity matrix of a graph. Instead of constructing a complete graph, we resort to a k-nearest neighbor graph for label propagation with cluster consistency. To the best of our knowledge, this is the first application of kernel methods and 2D Markov models simultaneously to image categorization. Experiments on the Corel and histological image databases demonstrate that the proposed method can achieve superior results.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116700675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Learning semantic visual vocabularies using diffusion distance 利用扩散距离学习语义视觉词汇
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206845
Jingen Liu, Yang Yang, M. Shah
In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.
在本文中,我们提出了一种学习通用视觉词汇的新方法。我们使用扩散映射从大量量化的中间特征中自动学习语义视觉词汇。每个中级特征由点互信息(PMI)向量表示。在这个中级特征空间中,我们认为相似源产生的特征必须位于某个流形上。为了捕捉特征之间的内在几何关系,我们使用扩散距离来测量它们的不相似性。其基本思想是将中级特征嵌入到语义较低维空间中。我们的目标是构建一个紧凑但有区别的语义视觉词汇。虽然使用k-means的传统方法对词汇构建有很好的效果,但其性能对视觉词汇的大小很敏感。此外,由于聚类标准仅基于外观相似性,因此学习到的视觉词没有语义意义。我们提出的方法可以通过使用扩散图捕获特征空间的语义和几何关系来有效地克服这些问题。与一些监督词汇构建方法以及pLSA和LDA等非监督方法不同,扩散映射可以捕获流形上中层特征点之间的局部固有几何关系。我们已经在KTH动作数据集、我们自己的YouTube动作数据集和15个场景数据集上测试了我们的方法,并获得了非常有希望的结果。
{"title":"Learning semantic visual vocabularies using diffusion distance","authors":"Jingen Liu, Yang Yang, M. Shah","doi":"10.1109/CVPR.2009.5206845","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206845","url":null,"abstract":"In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 193
Image deblurring for less intrusive iris capture 图像去模糊,较少侵入虹膜捕获
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206700
Xinyu Huang, Liu Ren, Ruigang Yang
For most iris capturing scenarios, captured iris images could easily blur when the user is out of the depth of field (DOF) of the camera, or when he or she is moving. The common solution is to let the user try the capturing process again as the quality of these blurred iris images is not good enough for recognition. In this paper, we propose a novel iris deblurring algorithm that can be used to improve the robustness and nonintrusiveness for iris capture. Unlike other iris deblurring algorithms, the key feature of our algorithm is that we use the domain knowledge inherent in iris images and iris capture settings to improve the performance, which could be in the form of iris image statistics, characteristics of pupils or highlights, or even depth information from the iris capturing system itself. Our experiments on both synthetic and real data demonstrate that our deblurring algorithm can significantly restore blurred iris patterns and therefore improve the robustness of iris capture.
对于大多数虹膜捕获场景,当用户超出相机的景深(DOF)时,或者当他或她移动时,捕获的虹膜图像很容易模糊。常见的解决方案是让用户再次尝试捕获过程,因为这些模糊的虹膜图像质量不够好,无法识别。在本文中,我们提出了一种新的虹膜去模糊算法,可以用来提高虹膜捕获的鲁棒性和非侵入性。与其他虹膜去模糊算法不同,我们的算法的关键特征是我们使用虹膜图像固有的领域知识和虹膜捕获设置来提高性能,这些知识可以以虹膜图像统计,瞳孔或高光特征,甚至虹膜捕获系统本身的深度信息的形式出现。我们在合成数据和真实数据上的实验表明,我们的去模糊算法可以显著地恢复模糊的虹膜图案,从而提高虹膜捕获的鲁棒性。
{"title":"Image deblurring for less intrusive iris capture","authors":"Xinyu Huang, Liu Ren, Ruigang Yang","doi":"10.1109/CVPR.2009.5206700","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206700","url":null,"abstract":"For most iris capturing scenarios, captured iris images could easily blur when the user is out of the depth of field (DOF) of the camera, or when he or she is moving. The common solution is to let the user try the capturing process again as the quality of these blurred iris images is not good enough for recognition. In this paper, we propose a novel iris deblurring algorithm that can be used to improve the robustness and nonintrusiveness for iris capture. Unlike other iris deblurring algorithms, the key feature of our algorithm is that we use the domain knowledge inherent in iris images and iris capture settings to improve the performance, which could be in the form of iris image statistics, characteristics of pupils or highlights, or even depth information from the iris capturing system itself. Our experiments on both synthetic and real data demonstrate that our deblurring algorithm can significantly restore blurred iris patterns and therefore improve the robustness of iris capture.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"2017 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115551023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive Basin Hopping Monte Carlo sampling 基于斑块的动态外观建模和自适应盆地跳蒙特卡罗采样的非刚性物体跟踪
Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206502
Junseok Kwon, Kyoung Mu Lee
We propose a novel tracking algorithm for the target of which geometric appearance changes drastically over time. To track it, we present a local patch-based appearance model and provide an efficient scheme to evolve the topology between local patches by on-line update. In the process of on-line update, the robustness of each patch in the model is estimated by a new method of measurement which analyzes the landscape of local mode of the patch. This patch can be moved, deleted or newly added, which gives more flexibility to the model. Additionally, we introduce the Basin Hopping Monte Carlo (BHMC) sampling method to our tracking problem to reduce the computational complexity and deal with the problem of getting trapped in local minima. The BHMC method makes it possible for our appearance model to consist of enough numbers of patches. Since BHMC uses the same local optimizer that is used in the appearance modeling, it can be efficiently integrated into our tracking framework. Experimental results show that our approach tracks the object whose geometric appearance is drastically changing, accurately and robustly.
我们提出了一种新的目标几何形状随时间急剧变化的跟踪算法。为了对其进行跟踪,我们提出了一种基于局部补丁的外观模型,并提供了一种通过在线更新在局部补丁之间进化拓扑的有效方案。在在线更新过程中,通过一种新的测量方法来估计模型中每个补丁的鲁棒性,该方法分析了补丁的局部模式的景观。这个补丁可以移动,删除或新增,这给了模型更多的灵活性。此外,我们在跟踪问题中引入了盆跳蒙特卡罗(BHMC)采样方法,以降低计算复杂度并解决陷入局部极小值的问题。BHMC方法使我们的外观模型可以由足够数量的斑块组成。由于BHMC使用与外观建模中使用的相同的局部优化器,因此它可以有效地集成到我们的跟踪框架中。实验结果表明,该方法能够准确、鲁棒地跟踪几何形状剧烈变化的目标。
{"title":"Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive Basin Hopping Monte Carlo sampling","authors":"Junseok Kwon, Kyoung Mu Lee","doi":"10.1109/CVPR.2009.5206502","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206502","url":null,"abstract":"We propose a novel tracking algorithm for the target of which geometric appearance changes drastically over time. To track it, we present a local patch-based appearance model and provide an efficient scheme to evolve the topology between local patches by on-line update. In the process of on-line update, the robustness of each patch in the model is estimated by a new method of measurement which analyzes the landscape of local mode of the patch. This patch can be moved, deleted or newly added, which gives more flexibility to the model. Additionally, we introduce the Basin Hopping Monte Carlo (BHMC) sampling method to our tracking problem to reduce the computational complexity and deal with the problem of getting trapped in local minima. The BHMC method makes it possible for our appearance model to consist of enough numbers of patches. Since BHMC uses the same local optimizer that is used in the appearance modeling, it can be efficiently integrated into our tracking framework. Experimental results show that our approach tracks the object whose geometric appearance is drastically changing, accurately and robustly.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121513086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 241
期刊
2009 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1