2009 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文中文

Efficient Kernels for identifying unbounded-order spatial features 无界阶空间特征识别的高效核算法

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206791

Yimeng Zhang, Tsuhan Chen

Higher order spatial features, such as doublets or triplets have been used to incorporate spatial information into the bag-of-local-features model. Due to computational limits, researchers have only been using features up to the 3rd order, i.e., triplets, since the number of features increases exponentially with the order. We propose an algorithm for identifying high-order spatial features efficiently. The algorithm directly evaluates the inner product of the feature vectors from two images to be compared, identifying all high-order features automatically. The algorithm hence serves as a kernel for any kernel-based learning algorithms. The algorithm is based on the idea that if a high-order spatial feature co-occurs in both images, the occurrence of the feature in one image would be a translation from the occurrence of the same feature in the other image. This enables us to compute the kernel in time that is linear to the number of local features in an image (same as the bag of local features approach), regardless of the order. Therefore, our algorithm does not limit the upper bound of the order as in previous work. The experiment results on the object categorization task show that high order features can be calculated efficiently and provide significant improvement in object categorization performance.

高阶空间特征(如双元或三元)被用于将空间信息整合到局部特征袋模型中。由于计算的限制，研究人员只使用三阶的特征，即三元组，因为特征的数量随着顺序呈指数增长。提出了一种高效识别高阶空间特征的算法。该算法直接计算两幅待比较图像特征向量的内积，自动识别出所有高阶特征。因此，该算法可以作为任何基于核的学习算法的核心。该算法基于这样一种思想，即如果一个高阶空间特征同时出现在两幅图像中，那么该特征在一幅图像中的出现将是另一幅图像中出现相同特征的转换。这使我们能够及时计算出与图像中局部特征数量成线性关系的内核(与局部特征包方法相同)，而不考虑顺序。因此，我们的算法不像以前的工作那样限制阶的上界。在目标分类任务上的实验结果表明，该方法可以有效地计算出高阶特征，显著提高了目标分类性能。

{"title":"Efficient Kernels for identifying unbounded-order spatial features","authors":"Yimeng Zhang, Tsuhan Chen","doi":"10.1109/CVPR.2009.5206791","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206791","url":null,"abstract":"Higher order spatial features, such as doublets or triplets have been used to incorporate spatial information into the bag-of-local-features model. Due to computational limits, researchers have only been using features up to the 3rd order, i.e., triplets, since the number of features increases exponentially with the order. We propose an algorithm for identifying high-order spatial features efficiently. The algorithm directly evaluates the inner product of the feature vectors from two images to be compared, identifying all high-order features automatically. The algorithm hence serves as a kernel for any kernel-based learning algorithms. The algorithm is based on the idea that if a high-order spatial feature co-occurs in both images, the occurrence of the feature in one image would be a translation from the occurrence of the same feature in the other image. This enables us to compute the kernel in time that is linear to the number of local features in an image (same as the bag of local features approach), regardless of the order. Therefore, our algorithm does not limit the upper bound of the order as in previous work. The experiment results on the object categorization task show that high order features can be calculated efficiently and provide significant improvement in object categorization performance.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116060066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Learning optimized MAP estimates in continuously-valued MRF models 在连续值MRF模型中学习优化MAP估计

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206774

Kegan G. G. Samuel, M. Tappen

We present a new approach for the discriminative training of continuous-valued Markov Random Field (MRF) model parameters. In our approach we train the MRF model by optimizing the parameters so that the minimum energy solution of the model is as similar as possible to the ground-truth. This leads to parameters which are directly optimized to increase the quality of the MAP estimates during inference. Our proposed technique allows us to develop a framework that is flexible and intuitively easy to understand and implement, which makes it an attractive alternative to learn the parameters of a continuous-valued MRF model. We demonstrate the effectiveness of our technique by applying it to the problems of image denoising and in-painting using the Field of Experts model. In our experiments, the performance of our system compares favourably to the Field of Experts model trained using contrastive divergence when applied to the denoising and in-painting tasks.

提出了一种判别训练连续值马尔可夫随机场模型参数的新方法。在我们的方法中，我们通过优化参数来训练MRF模型，使模型的最小能量解尽可能接近于真实值。这导致直接优化参数，以提高推理期间MAP估计的质量。我们提出的技术允许我们开发一个灵活的框架，直观地易于理解和实现，这使得它成为学习连续值MRF模型参数的一个有吸引力的替代方案。我们通过将该技术应用于使用专家领域模型的图像去噪和内画问题来证明该技术的有效性。在我们的实验中，当应用于去噪和绘画任务时，我们的系统的性能优于使用对比散度训练的专家领域模型。

引用次数: 101

Minimizing sparse higher order energy functions of discrete variables 离散变量的稀疏高阶能量函数的最小化

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206739

C. Rother, Pushmeet Kohli, Wei Feng, Jiaya Jia

Higher order energy functions have the ability to encode high level structural dependencies between pixels, which have been shown to be extremely powerful for image labeling problems. Their use, however, is severely hampered in practice by the intractable complexity of representing and minimizing such functions. We observed that higher order functions encountered in computer vision are very often “sparse”, i.e. many labelings of a higher order clique are equally unlikely and hence have the same high cost. In this paper, we address the problem of minimizing such sparse higher order energy functions. Our method works by transforming the problem into an equivalent quadratic function minimization problem. The resulting quadratic function can be minimized using popular message passing or graph cut based algorithms for MAP inference. Although this is primarily a theoretical paper, it also shows how higher order functions can be used to obtain impressive results for the binary texture restoration problem.

高阶能量函数具有编码像素之间高水平结构依赖关系的能力，这已被证明对图像标记问题非常强大。然而，它们的使用在实践中受到表示和最小化这些函数的棘手复杂性的严重阻碍。我们观察到，在计算机视觉中遇到的高阶函数通常是“稀疏的”，即高阶团的许多标记同样不太可能，因此具有相同的高成本。在本文中，我们讨论了最小化这类稀疏高阶能量函数的问题。我们的方法是将问题转化为一个等价的二次函数最小化问题。得到的二次函数可以使用流行的消息传递或基于图切的MAP推理算法最小化。虽然这主要是一篇理论论文，但它也展示了如何使用高阶函数来获得二进制纹理恢复问题的令人印象深刻的结果。

引用次数: 200

High-quality curvelet-based motion deblurring from an image pair 高质量的基于曲线的运动去模糊图像对

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206711

Jian-Feng Cai, Hui Ji, Chaoqiang Liu, Zuowei Shen

One promising approach to remove motion deblurring is to recover one clear image using an image pair. Existing dual-image methods require an accurate image alignment between the image pair, which could be very challenging even with the help of user interactions. Based on the observation that typical motion-blur kernels will have an extremely sparse representation in the redundant curvelet system, we propose a new minimization model to recover a clear image from the blurred image pair by enhancing the sparsity of blur kernels in the curvelet system. The sparsity prior on the motion-blur kernels improves the robustness of our algorithm to image alignment errors and image formation noise. Also, a numerical method is presented to efficiently solve the resulted minimization problem. The experiments showed that our proposed algorithm is capable of accurately estimating the blur kernels of complex camera motions with low requirement on the accuracy of image alignment, which in turn led to a high-quality recovered image from the blurred image pair.

消除运动去模糊的一个有希望的方法是使用图像对恢复一个清晰的图像。现有的双图像方法需要在图像对之间进行精确的图像对齐，即使在用户交互的帮助下，这也是非常具有挑战性的。基于观察到典型的运动模糊核在冗余曲线系统中具有极其稀疏的表示，我们提出了一种新的最小化模型，通过增强曲线系统中模糊核的稀疏性，从模糊图像对中恢复出清晰的图像。运动模糊核的先验稀疏性提高了算法对图像对齐误差和图像形成噪声的鲁棒性。同时，提出了一种数值方法来有效地求解结果的最小化问题。实验表明，该算法能够准确估计复杂摄像机运动的模糊核，对图像对准精度要求低，从而从模糊图像对中获得高质量的恢复图像。

引用次数: 45

Holistic context modeling using semantic co-occurrences 使用语义共现的整体上下文建模

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206826

Nikhil Rasiwasia, N. Vasconcelos

We present a simple framework to model contextual relationships between visual concepts. The new framework combines ideas from previous object-centric methods (which model contextual relationships between objects in an image, such as their co-occurrence patterns) and scene-centric methods (which learn a holistic context model from the entire image, known as its “gist”). This is accomplished without demarcating individual concepts or regions in the image. First, using the output of a generic appearance based concept detection system, a semantic space is formulated, where each axis represents a semantic feature. Next, context models are learned for each of the concepts in the semantic space, using mixtures of Dirichlet distributions. Finally, an image is represented as a vector of posterior concept probabilities under these contextual concept models. It is shown that these posterior probabilities are remarkably noise-free, and an effective model of the contextual relationships between semantic concepts in natural images. This is further demonstrated through an experimental evaluation with respect to two vision tasks, viz. scene classification and image annotation, on benchmark datasets. The results show that, besides quite simple to compute, the proposed context models attain superior performance than state of the art systems in both tasks.

我们提出了一个简单的框架来模拟视觉概念之间的上下文关系。新框架结合了以前以对象为中心的方法(为图像中对象之间的上下文关系建模，例如它们的共现模式)和以场景为中心的方法(从整个图像中学习整体上下文模型，称为“要点”)的思想。这是在不划分图像中的单个概念或区域的情况下完成的。首先，使用基于通用外观的概念检测系统的输出，形成一个语义空间，其中每个轴表示一个语义特征。接下来，使用Dirichlet分布的混合，为语义空间中的每个概念学习上下文模型。最后，在这些上下文概念模型下，将图像表示为后验概念概率向量。结果表明，这些后验概率具有显著的无噪声性，是自然图像中语义概念之间上下文关系的有效模型。通过在基准数据集上对两个视觉任务(即场景分类和图像注释)进行实验评估，进一步证明了这一点。结果表明，除了计算简单之外，所提出的上下文模型在这两个任务中都比目前的系统具有更好的性能。

{"title":"Holistic context modeling using semantic co-occurrences","authors":"Nikhil Rasiwasia, N. Vasconcelos","doi":"10.1109/CVPR.2009.5206826","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206826","url":null,"abstract":"We present a simple framework to model contextual relationships between visual concepts. The new framework combines ideas from previous object-centric methods (which model contextual relationships between objects in an image, such as their co-occurrence patterns) and scene-centric methods (which learn a holistic context model from the entire image, known as its “gist”). This is accomplished without demarcating individual concepts or regions in the image. First, using the output of a generic appearance based concept detection system, a semantic space is formulated, where each axis represents a semantic feature. Next, context models are learned for each of the concepts in the semantic space, using mixtures of Dirichlet distributions. Finally, an image is represented as a vector of posterior concept probabilities under these contextual concept models. It is shown that these posterior probabilities are remarkably noise-free, and an effective model of the contextual relationships between semantic concepts in natural images. This is further demonstrated through an experimental evaluation with respect to two vision tasks, viz. scene classification and image annotation, on benchmark datasets. The results show that, besides quite simple to compute, the proposed context models attain superior performance than state of the art systems in both tasks.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Large displacement optical flow 大位移光流

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206697

T. Brox, C. Bregler, Jitendra Malik

The literature currently provides two ways to establish point correspondences between images with moving objects. On one side, there are energy minimization methods that yield very accurate, dense flow fields, but fail as displacements get too large. On the other side, there is descriptor matching that allows for large displacements, but correspondences are very sparse, have limited accuracy, and due to missing regularity constraints there are many outliers. In this paper we propose a method that can combine the advantages of both matching strategies. A region hierarchy is established for both images. Descriptor matching on these regions provides a sparse set of hypotheses for correspondences. These are integrated into a variational approach and guide the local optimization to large displacement solutions. The variational optimization selects among the hypotheses and provides dense and subpixel accurate estimates, making use of geometric constraints and all available image information.

目前的文献提供了两种方法来建立具有运动物体的图像之间的点对应关系。一方面，有能量最小化方法可以产生非常精确、密集的流场，但当位移太大时就失败了。另一方面，存在允许大位移的描述符匹配，但对应非常稀疏，精度有限，并且由于缺少规则约束，存在许多异常值。本文提出了一种结合两种匹配策略优点的方法。为两个图像建立了区域层次结构。在这些区域上的描述符匹配提供了一组稀疏的对应假设。这些都集成到变分方法中，并指导局部优化到大位移解决方案。变分优化在假设中进行选择，并利用几何约束和所有可用的图像信息提供密集和亚像素级的精确估计。

引用次数: 336

Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates 局部观察，全局推断:用于检测增量更新异常活动的时空磁流变函数

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206569

Jaechul Kim, K. Grauman

We propose a space-time Markov random field (MRF) model to detect abnormal activities in video. The nodes in the MRF graph correspond to a grid of local regions in the video frames, and neighboring nodes in both space and time are associated with links. To learn normal patterns of activity at each local node, we capture the distribution of its typical optical flow with a mixture of probabilistic principal component analyzers. For any new optical flow patterns detected in incoming video clips, we use the learned model and MRF graph to compute a maximum a posteriori estimate of the degree of normality at each local node. Further, we show how to incrementally update the current model parameters as new video observations stream in, so that the model can efficiently adapt to visual context changes over a long period of time. Experimental results on surveillance videos show that our space-time MRF model robustly detects abnormal activities both in a local and global sense: not only does it accurately localize the atomic abnormal activities in a crowded video, but at the same time it captures the global-level abnormalities caused by irregular interactions between local activities.

提出了一种时空马尔可夫随机场(MRF)模型来检测视频中的异常活动。MRF图中的节点对应于视频帧中的局部区域网格，空间和时间上的相邻节点与链接相关联。为了了解每个局部节点的正常活动模式，我们使用混合概率主成分分析仪捕获其典型光流的分布。对于在传入视频片段中检测到的任何新的光流模式，我们使用学习的模型和MRF图来计算每个局部节点的正态度的最大后验估计。此外，我们还展示了如何随着新的视频观测流的输入而增量地更新当前模型参数，从而使模型能够有效地适应长时间的视觉环境变化。在监控视频上的实验结果表明，我们的时空MRF模型在局部和全局意义上都能鲁棒地检测异常活动:它不仅能准确地定位拥挤视频中的原子异常活动，同时还能捕获局部活动之间不规则相互作用引起的全局异常。

{"title":"Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates","authors":"Jaechul Kim, K. Grauman","doi":"10.1109/CVPR.2009.5206569","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206569","url":null,"abstract":"We propose a space-time Markov random field (MRF) model to detect abnormal activities in video. The nodes in the MRF graph correspond to a grid of local regions in the video frames, and neighboring nodes in both space and time are associated with links. To learn normal patterns of activity at each local node, we capture the distribution of its typical optical flow with a mixture of probabilistic principal component analyzers. For any new optical flow patterns detected in incoming video clips, we use the learned model and MRF graph to compute a maximum a posteriori estimate of the degree of normality at each local node. Further, we show how to incrementally update the current model parameters as new video observations stream in, so that the model can efficiently adapt to visual context changes over a long period of time. Experimental results on surveillance videos show that our space-time MRF model robustly detects abnormal activities both in a local and global sense: not only does it accurately localize the atomic abnormal activities in a crowded video, but at the same time it captures the global-level abnormalities caused by irregular interactions between local activities.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114176137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 717

Observable subspaces for 3D human motion recovery 三维人体运动恢复的可观察子空间

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206489

A. Fossati, M. Salzmann, P. Fua

The articulated body models used to represent human motion typically have many degrees of freedom, usually expressed as joint angles that are highly correlated. The true range of motion can therefore be represented by latent variables that span a low-dimensional space. This has often been used to make motion tracking easier. However, learning the latent space in a problem- independent way makes it non trivial to initialize the tracking process by picking appropriate initial values for the latent variables, and thus for the pose. In this paper, we show that by directly using observable quantities as our latent variables, we eliminate this problem and achieve full automation given only modest amounts of training data. More specifically, we exploit the fact that the trajectory of a person's feet or hands strongly constrains body pose in motions such as skating, skiing, or golfing. These trajectories are easy to compute and to parameterize using a few variables. We treat these as our latent variables and learn a mapping between them and sequences of body poses. In this manner, by simply tracking the feet or the hands, we can reliably guess initial poses over whole sequences and, then, refine them.

用于表示人体运动的关节体模型通常具有许多自由度，通常表示为高度相关的关节角度。因此，真实的运动范围可以用跨越低维空间的潜在变量来表示。这通常用于使运动跟踪更容易。然而，以问题独立的方式学习潜在空间使得通过为潜在变量选择适当的初始值来初始化跟踪过程变得不那么简单，从而为姿态选择合适的初始值。在本文中，我们表明，通过直接使用可观察量作为我们的潜在变量，我们消除了这个问题，并实现了完全自动化，只给出了适量的训练数据。更具体地说，我们利用了这样一个事实，即一个人的脚或手的运动轨迹强烈地限制了滑冰、滑雪或高尔夫等运动中的身体姿势。这些轨迹很容易计算，也很容易用几个变量来参数化。我们将这些视为潜在变量，并学习它们与身体姿势序列之间的映射。通过这种方式，通过简单地跟踪脚或手，我们可以可靠地猜测整个序列的初始姿势，然后，完善它们。

{"title":"Observable subspaces for 3D human motion recovery","authors":"A. Fossati, M. Salzmann, P. Fua","doi":"10.1109/CVPR.2009.5206489","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206489","url":null,"abstract":"The articulated body models used to represent human motion typically have many degrees of freedom, usually expressed as joint angles that are highly correlated. The true range of motion can therefore be represented by latent variables that span a low-dimensional space. This has often been used to make motion tracking easier. However, learning the latent space in a problem- independent way makes it non trivial to initialize the tracking process by picking appropriate initial values for the latent variables, and thus for the pose. In this paper, we show that by directly using observable quantities as our latent variables, we eliminate this problem and achieve full automation given only modest amounts of training data. More specifically, we exploit the fact that the trajectory of a person's feet or hands strongly constrains body pose in motions such as skating, skiing, or golfing. These trajectories are easy to compute and to parameterize using a few variables. We treat these as our latent variables and learn a mapping between them and sequences of body poses. In this manner, by simply tracking the feet or the hands, we can reliably guess initial poses over whole sequences and, then, refine them.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117261755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Modeling images as mixtures of reference images 将图像建模为参考图像的混合物

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206781

F. Perronnin, Yan Liu

A state-of-the-art approach to measure the similarity of two images is to model each image by a continuous distribution, generally a Gaussian mixture model (GMM), and to compute a probabilistic similarity between the GMMs. One limitation of traditional measures such as the Kullback-Leibler (KL) divergence and the probability product kernel (PPK) is that they measure a global match of distributions. This paper introduces a novel image representation. We propose to approximate an image, modeled by a GMM, as a convex combination of K reference image GMMs, and then to describe the image as the K-dimensional vector of mixture weights. The computed weights encode a similarity that favors local matches (i.e. matches of individual Gaussians) and is therefore fundamentally different from the KL or PPK. Although the computation of the mixture weights is a convex optimization problem, its direct optimization is difficult. We propose two approximate optimization algorithms: the first one based on traditional sampling methods, the second one based on a variational bound approximation of the true objective function. We apply this novel representation to the image categorization problem and compare its performance to traditional kernel-based methods. We demonstrate on the PASCAL VOC 2007 dataset a consistent increase in classification accuracy.

测量两幅图像相似性的最先进方法是通过连续分布(通常是高斯混合模型(GMM))对每个图像进行建模，并计算GMM之间的概率相似性。Kullback-Leibler (KL)散度和概率积核(PPK)等传统度量方法的一个局限性是它们度量的是分布的全局匹配。本文介绍了一种新的图像表示方法。我们建议将由GMM建模的图像近似为K个参考图像GMM的凸组合，然后将图像描述为混合权重的K维向量。计算的权重编码了一种有利于局部匹配的相似性(即单个高斯的匹配)，因此与KL或PPK有本质的不同。虽然混合权值的计算是一个凸优化问题，但其直接优化是困难的。我们提出了两种近似优化算法:第一种是基于传统的采样方法，第二种是基于真实目标函数的变分界近似。我们将这种新的表示应用于图像分类问题，并将其性能与传统的基于核的方法进行了比较。我们在PASCAL VOC 2007数据集上证明了分类精度的一致提高。

{"title":"Modeling images as mixtures of reference images","authors":"F. Perronnin, Yan Liu","doi":"10.1109/CVPR.2009.5206781","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206781","url":null,"abstract":"A state-of-the-art approach to measure the similarity of two images is to model each image by a continuous distribution, generally a Gaussian mixture model (GMM), and to compute a probabilistic similarity between the GMMs. One limitation of traditional measures such as the Kullback-Leibler (KL) divergence and the probability product kernel (PPK) is that they measure a global match of distributions. This paper introduces a novel image representation. We propose to approximate an image, modeled by a GMM, as a convex combination of K reference image GMMs, and then to describe the image as the K-dimensional vector of mixture weights. The computed weights encode a similarity that favors local matches (i.e. matches of individual Gaussians) and is therefore fundamentally different from the KL or PPK. Although the computation of the mixture weights is a convex optimization problem, its direct optimization is difficult. We propose two approximate optimization algorithms: the first one based on traditional sampling methods, the second one based on a variational bound approximation of the true objective function. We apply this novel representation to the image categorization problem and compare its performance to traditional kernel-based methods. We demonstrate on the PASCAL VOC 2007 dataset a consistent increase in classification accuracy.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128384560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Automatic registration of LIDAR and optical images of urban scenes 城市场景激光雷达和光学图像的自动配准

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206539

Andrew Mastin, J. Kepner, John W. Fisher III

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a novel application of mutual information registration methods, which exploits the statistical dependency in urban scenes of optical appearance with measured LIDAR elevation. We utilize the well known downhill simplex optimization to infer camera pose parameters. We discuss three methods for measuring mutual information between LIDAR imagery and optical imagery. Utilization of OpenGL and graphics hardware in the optimization process yields registration times dramatically lower than previous methods. Using an initial registration comparable to GPS/INS accuracy, we demonstrate the utility of our algorithm with a collection of urban images and present 3D models created with the fused imagery.

三维激光雷达(LIDAR)图像与航空光学图像融合是构建三维虚拟现实模型的有效方法。创建这样的模型的一个困难方面是注册光学图像与激光雷达点云，其特点是相机姿态估计问题。我们提出了一种新的互信息配准方法，该方法利用了城市场景光学外观与测量激光雷达高程的统计相关性。我们利用众所周知的下坡单纯形优化来推断相机姿态参数。讨论了三种测量激光雷达图像与光学图像互信息的方法。在优化过程中使用OpenGL和图形硬件使注册时间大大低于以前的方法。使用与GPS/INS精度相当的初始配准，我们用一组城市图像展示了我们的算法的实用性，并展示了用融合图像创建的3D模型。

引用次数: 194

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀