首页 > 最新文献

2011 International Conference on Computer Vision最新文献

英文 中文
The NBNN kernel NBNN内核
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126449
T. Tuytelaars, Mario Fritz, Kate Saenko, Trevor Darrell
Naive Bayes Nearest Neighbor (NBNN) has recently been proposed as a powerful, non-parametric approach for object classification, that manages to achieve remarkably good results thanks to the avoidance of a vector quantization step and the use of image-to-class comparisons, yielding good generalization. In this paper, we introduce a kernelized version of NBNN. This way, we can learn the classifier in a discriminative setting. Moreover, it then becomes straightforward to combine it with other kernels. In particular, we show that our NBNN kernel is complementary to standard bag-of-features based kernels, focussing on local generalization as opposed to global image composition. By combining them, we achieve state-of-the-art results on Caltech101 and 15 Scenes datasets. As a side contribution, we also investigate how to speed up the NBNN computations.
朴素贝叶斯最近邻(NBNN)最近被提出作为一种强大的非参数对象分类方法,由于避免了矢量量化步骤和使用图像到类别的比较,它获得了非常好的结果,产生了良好的泛化。在本文中,我们介绍了一个核化版本的NBNN。这样,我们可以在判别设置中学习分类器。而且,将它与其他内核结合起来也变得很简单。特别是,我们证明了我们的NBNN内核是标准的基于特征袋的内核的补充,它专注于局部泛化,而不是全局图像合成。通过结合它们,我们在Caltech101和15个场景数据集上获得了最先进的结果。此外,我们还研究了如何加快NBNN的计算速度。
{"title":"The NBNN kernel","authors":"T. Tuytelaars, Mario Fritz, Kate Saenko, Trevor Darrell","doi":"10.1109/ICCV.2011.6126449","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126449","url":null,"abstract":"Naive Bayes Nearest Neighbor (NBNN) has recently been proposed as a powerful, non-parametric approach for object classification, that manages to achieve remarkably good results thanks to the avoidance of a vector quantization step and the use of image-to-class comparisons, yielding good generalization. In this paper, we introduce a kernelized version of NBNN. This way, we can learn the classifier in a discriminative setting. Moreover, it then becomes straightforward to combine it with other kernels. In particular, we show that our NBNN kernel is complementary to standard bag-of-features based kernels, focussing on local generalization as opposed to global image composition. By combining them, we achieve state-of-the-art results on Caltech101 and 15 Scenes datasets. As a side contribution, we also investigate how to speed up the NBNN computations.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"426 1","pages":"1824-1831"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77859656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
Automated corpus callosum extraction via Laplace-Beltrami nodal parcellation and intrinsic geodesic curvature flows on surfaces 基于拉普拉斯-贝尔特拉米节点分割和表面固有测地线曲率流的自动胼胝体提取
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126476
Rongjie Lai, Yonggang Shi, N. Sicotte, A. Toga
Corpus callosum (CC) is an important structure in human brain anatomy. In this work, we propose a fully automated and robust approach to extract corpus callosum from T1-weighted structural MR images. The novelty of our method is composed of two key steps. In the first step, we find an initial guess for the curve representation of CC by using the zero level set of the first nontrivial Laplace-Beltrami (LB) eigenfunction on the white matter surface. In the second step, the initial curve is deformed toward the final solution with a geodesic curvature flow on the white matter surface. For numerical solution of the geodesic curvature flow on surfaces, we represent the contour implicitly on a triangular mesh and develop efficient numerical schemes based on finite element method. Because our method depends only on the intrinsic geometry of the white matter surface, it is robust to orientation differences of the brain across population. In our experiments, we validate the proposed algorithm on 32 brains from a clinical study of multiple sclerosis disease and demonstrate that the accuracy of our results.
胼胝体(CC)是人脑解剖学中的一个重要结构。在这项工作中,我们提出了一种完全自动化和健壮的方法来从t1加权结构MR图像中提取胼胝体。我们的方法的新颖性由两个关键步骤组成。在第一步中,我们利用白质表面上的第一个非平凡拉普拉斯-贝尔特拉米(LB)特征函数的零水平集对CC的曲线表示进行了初步猜测。在第二步中,初始曲线在白质表面以测地线曲率流向最终解变形。对于曲面上测地线曲率流的数值解,我们采用三角网格隐式表示曲面轮廓,并基于有限元法开发了有效的数值格式。由于我们的方法仅依赖于白质表面的固有几何形状,因此它对不同人群的大脑方向差异具有鲁棒性。在我们的实验中,我们在来自多发性硬化症临床研究的32个大脑上验证了所提出的算法,并证明了我们结果的准确性。
{"title":"Automated corpus callosum extraction via Laplace-Beltrami nodal parcellation and intrinsic geodesic curvature flows on surfaces","authors":"Rongjie Lai, Yonggang Shi, N. Sicotte, A. Toga","doi":"10.1109/ICCV.2011.6126476","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126476","url":null,"abstract":"Corpus callosum (CC) is an important structure in human brain anatomy. In this work, we propose a fully automated and robust approach to extract corpus callosum from T1-weighted structural MR images. The novelty of our method is composed of two key steps. In the first step, we find an initial guess for the curve representation of CC by using the zero level set of the first nontrivial Laplace-Beltrami (LB) eigenfunction on the white matter surface. In the second step, the initial curve is deformed toward the final solution with a geodesic curvature flow on the white matter surface. For numerical solution of the geodesic curvature flow on surfaces, we represent the contour implicitly on a triangular mesh and develop efficient numerical schemes based on finite element method. Because our method depends only on the intrinsic geometry of the white matter surface, it is robust to orientation differences of the brain across population. In our experiments, we validate the proposed algorithm on 32 brains from a clinical study of multiple sclerosis disease and demonstrate that the accuracy of our results.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"401 1","pages":"2034-2040"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76641952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Adaptive deconvolutional networks for mid and high level feature learning 用于中高级特征学习的自适应反卷积网络
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126474
Matthew D. Zeiler, Graham W. Taylor, R. Fergus
We present a hierarchical model that learns image decompositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a variety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hierarchical approaches. This makes it possible to learn multiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. When combined with a standard classifier, features extracted from these models outperform SIFT, as well as representations from other feature learning methods.
我们提出了一个分层模型,该模型通过卷积稀疏编码和最大池化交替层来学习图像分解。当在自然图像上训练时,我们模型的层以各种形式捕获图像信息:低级边缘,中级边缘连接,高级对象部分和完整对象。为了构建我们的模型,我们依赖于一种新的推理方案,该方案确保每一层重建输入,而不仅仅是直接在下一层的输出,就像现有的分层方法一样。这使得学习多层表示成为可能,我们展示了4层的模型,这些模型是在来自Caltech-101和256数据集的图像上训练的。当与标准分类器结合使用时,从这些模型中提取的特征优于SIFT,也优于其他特征学习方法的表示。
{"title":"Adaptive deconvolutional networks for mid and high level feature learning","authors":"Matthew D. Zeiler, Graham W. Taylor, R. Fergus","doi":"10.1109/ICCV.2011.6126474","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126474","url":null,"abstract":"We present a hierarchical model that learns image decompositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a variety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hierarchical approaches. This makes it possible to learn multiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. When combined with a standard classifier, features extracted from these models outperform SIFT, as well as representations from other feature learning methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"3 1","pages":"2018-2025"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77059727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1182
Fast articulated motion tracking using a sums of Gaussians body model 基于高斯和体模型的快速关节运动跟踪
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126338
Carsten Stoll, N. Hasler, Juergen Gall, H. Seidel, C. Theobalt
We present an approach for modeling the human body by Sums of spatial Gaussians (SoG), allowing us to perform fast and high-quality markerless motion capture from multi-view video sequences. The SoG model is equipped with a color model to represent the shape and appearance of the human and can be reconstructed from a sparse set of images. Similar to the human body, we also represent the image domain as SoG that models color consistent image blobs. Based on the SoG models of the image and the human body, we introduce a novel continuous and differentiable model-to-image similarity measure that can be used to estimate the skeletal motion of a human at 5–15 frames per second even for many camera views. In our experiments, we show that our method, which does not rely on silhouettes or training data, offers an good balance between accuracy and computational cost.
我们提出了一种通过空间高斯和(SoG)对人体建模的方法,使我们能够从多视图视频序列中执行快速和高质量的无标记运动捕获。SoG模型配备了一个颜色模型来表示人的形状和外观,可以从稀疏的图像集合中重建。与人体相似,我们也将图像域表示为SoG,该SoG对颜色一致的图像斑点进行建模。基于图像和人体的SoG模型,我们引入了一种新的连续和可微的模型-图像相似性度量,即使在许多摄像机视图下,该度量也可用于估计人体骨骼每秒5-15帧的运动。在我们的实验中,我们表明我们的方法不依赖于轮廓或训练数据,在精度和计算成本之间提供了很好的平衡。
{"title":"Fast articulated motion tracking using a sums of Gaussians body model","authors":"Carsten Stoll, N. Hasler, Juergen Gall, H. Seidel, C. Theobalt","doi":"10.1109/ICCV.2011.6126338","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126338","url":null,"abstract":"We present an approach for modeling the human body by Sums of spatial Gaussians (SoG), allowing us to perform fast and high-quality markerless motion capture from multi-view video sequences. The SoG model is equipped with a color model to represent the shape and appearance of the human and can be reconstructed from a sparse set of images. Similar to the human body, we also represent the image domain as SoG that models color consistent image blobs. Based on the SoG models of the image and the human body, we introduce a novel continuous and differentiable model-to-image similarity measure that can be used to estimate the skeletal motion of a human at 5–15 frames per second even for many camera views. In our experiments, we show that our method, which does not rely on silhouettes or training data, offers an good balance between accuracy and computational cost.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"416 1","pages":"951-958"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80105962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 220
Assessing the aesthetic quality of photographs using generic image descriptors 使用通用图像描述符评估照片的美学质量
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126444
L. Marchesotti, F. Perronnin, Diane Larlus, G. Csurka
In this paper, we automatically assess the aesthetic properties of images. In the past, this problem has been addressed by hand-crafting features which would correlate with best photographic practices (e.g. “Does this image respect the rule of thirds?”) or with photographic techniques (e.g. “Is this image a macro?”). We depart from this line of research and propose to use generic image descriptors to assess aesthetic quality. We experimentally show that the descriptors we use, which aggregate statistics computed from low-level local features, implicitly encode the aesthetic properties explicitly used by state-of-the-art methods and outperform them by a significant margin.
在本文中,我们自动评估图像的美学属性。在过去,这个问题已经通过与最佳摄影实践相关的手工制作功能(例如“这张照片是否遵守三分法?”)或摄影技术(例如“这张照片是微距的吗?”)来解决。我们从这条研究路线出发,建议使用通用图像描述符来评估美学质量。我们通过实验表明,我们使用的描述符(汇总了从低级局部特征计算的统计数据)隐式地编码了最先进方法显式使用的美学属性,并且在很大程度上优于它们。
{"title":"Assessing the aesthetic quality of photographs using generic image descriptors","authors":"L. Marchesotti, F. Perronnin, Diane Larlus, G. Csurka","doi":"10.1109/ICCV.2011.6126444","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126444","url":null,"abstract":"In this paper, we automatically assess the aesthetic properties of images. In the past, this problem has been addressed by hand-crafting features which would correlate with best photographic practices (e.g. “Does this image respect the rule of thirds?”) or with photographic techniques (e.g. “Is this image a macro?”). We depart from this line of research and propose to use generic image descriptors to assess aesthetic quality. We experimentally show that the descriptors we use, which aggregate statistics computed from low-level local features, implicitly encode the aesthetic properties explicitly used by state-of-the-art methods and outperform them by a significant margin.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"180 1","pages":"1784-1791"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75471472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 388
Stereo reconstruction using high order likelihood 利用高阶似然进行立体重建
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126371
H. Jung, Kyoung Mu Lee, Sang Uk Lee
Under the popular Bayesian approach, a stereo problem can be formulated by defining likelihood and prior. Likelihoods are often associated with unary terms and priors are defined by pair-wise or higher order cliques in Markov random field (MRF). In this paper, we propose to use high order likelihood model in stereo. Numerous conventional patch based matching methods such as normalized cross correlation, Laplacian of Gaussian, or census filters are designed under the naive assumption that all the pixels of a patch have the same disparities. However, patch-wise cost can be formulated as higher order cliques for MRF so that the matching cost is a function of image patch's disparities. A patch obtained from the projected image by a disparity map should provide a better match without the blurring effect around disparity discontinuities. Among patch-wise high order matching costs, the census filter approach can be easily reduced to pair-wise cliques. The experimental results on census filter-based high order likelihood demonstrate the advantages of high order likelihood over independent identically distributed unary model.
在流行的贝叶斯方法下,一个立体问题可以通过定义可能性和先验来表述。可能性通常与一元项相关,先验由马尔可夫随机场(MRF)中的成对或高阶团定义。在本文中,我们提出在立体中使用高阶似然模型。许多传统的基于patch的匹配方法,如归一化互相关、拉普拉斯高斯或人口普查滤波器,都是在一个patch的所有像素具有相同差异的天真假设下设计的。然而,对于MRF,块成本可以表示为高阶团,因此匹配成本是图像块差异的函数。通过视差图从投影图像中获得的补丁应该提供更好的匹配,而不会在视差不连续处产生模糊效果。在基于补丁的高阶匹配成本中,人口普查过滤器方法可以很容易地简化为基于成对的小集团。基于普查滤波器的高阶似然模型的实验结果表明,高阶似然模型优于独立同分布一元模型。
{"title":"Stereo reconstruction using high order likelihood","authors":"H. Jung, Kyoung Mu Lee, Sang Uk Lee","doi":"10.1109/ICCV.2011.6126371","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126371","url":null,"abstract":"Under the popular Bayesian approach, a stereo problem can be formulated by defining likelihood and prior. Likelihoods are often associated with unary terms and priors are defined by pair-wise or higher order cliques in Markov random field (MRF). In this paper, we propose to use high order likelihood model in stereo. Numerous conventional patch based matching methods such as normalized cross correlation, Laplacian of Gaussian, or census filters are designed under the naive assumption that all the pixels of a patch have the same disparities. However, patch-wise cost can be formulated as higher order cliques for MRF so that the matching cost is a function of image patch's disparities. A patch obtained from the projected image by a disparity map should provide a better match without the blurring effect around disparity discontinuities. Among patch-wise high order matching costs, the census filter approach can be easily reduced to pair-wise cliques. The experimental results on census filter-based high order likelihood demonstrate the advantages of high order likelihood over independent identically distributed unary model.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"9 1","pages":"1211-1218"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78876754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Recognizing jumbled images: The role of local and global information in image classification 混杂图像识别:局部和全局信息在图像分类中的作用
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126283
Devi Parikh
The performance of current state-of-the-art computer vision algorithms at image classification falls significantly short as compared to human abilities. To reduce this gap, it is important for the community to know what problems to solve, and not just how to solve them. Towards this goal, via the use of jumbled images, we strip apart two widely investigated aspects: local and global information in images, and identify the performance bottleneck. Interestingly, humans have been shown to reliably recognize jumbled images. The goal of our paper is to determine a functional model that mimics how humans recognize jumbled images i.e. exploit local information alone, and further evaluate if existing implementations of this computational model suffice to match human performance. Surprisingly, in our series of human studies and machine experiments, we find that a simple bag-of-words based majority-vote-like strategy is an accurate functional model of how humans recognize jumbled images. Moreover, a straightforward machine implementation of this model achieves accuracies similar to human subjects at classifying jumbled images. This indicates that perhaps existing machine vision techniques already leverage local information from images effectively, and future research efforts should be focused on more advanced modeling of global information.
目前最先进的计算机视觉算法在图像分类方面的表现与人类的能力相比明显不足。为了缩小这一差距,社区必须知道要解决什么问题,而不仅仅是如何解决问题。为了实现这一目标,通过使用混乱的图像,我们剥离了两个广泛研究的方面:图像中的局部和全局信息,并确定了性能瓶颈。有趣的是,人类已经被证明能够可靠地识别杂乱的图像。我们论文的目标是确定一个模拟人类如何识别混乱图像的功能模型,即单独利用局部信息,并进一步评估该计算模型的现有实现是否足以匹配人类的表现。令人惊讶的是,在我们的一系列人类研究和机器实验中,我们发现一个简单的基于词袋的多数投票策略是人类如何识别混乱图像的准确功能模型。此外,该模型的直接机器实现在对混乱图像进行分类时达到了与人类受试者相似的精度。这表明,也许现有的机器视觉技术已经有效地利用了图像中的局部信息,未来的研究工作应该集中在更先进的全局信息建模上。
{"title":"Recognizing jumbled images: The role of local and global information in image classification","authors":"Devi Parikh","doi":"10.1109/ICCV.2011.6126283","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126283","url":null,"abstract":"The performance of current state-of-the-art computer vision algorithms at image classification falls significantly short as compared to human abilities. To reduce this gap, it is important for the community to know what problems to solve, and not just how to solve them. Towards this goal, via the use of jumbled images, we strip apart two widely investigated aspects: local and global information in images, and identify the performance bottleneck. Interestingly, humans have been shown to reliably recognize jumbled images. The goal of our paper is to determine a functional model that mimics how humans recognize jumbled images i.e. exploit local information alone, and further evaluate if existing implementations of this computational model suffice to match human performance. Surprisingly, in our series of human studies and machine experiments, we find that a simple bag-of-words based majority-vote-like strategy is an accurate functional model of how humans recognize jumbled images. Moreover, a straightforward machine implementation of this model achieves accuracies similar to human subjects at classifying jumbled images. This indicates that perhaps existing machine vision techniques already leverage local information from images effectively, and future research efforts should be focused on more advanced modeling of global information.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"196 1","pages":"519-526"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79855633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Globally optimal solution to multi-object tracking with merged measurements 融合测量的多目标跟踪全局最优解
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126532
João F. Henriques, Rui Caseiro, Jorge P. Batista
Multiple object tracking has been formulated recently as a global optimization problem, and solved efficiently with optimal methods such as the Hungarian Algorithm. A severe limitation is the inability to model multiple objects that are merged into a single measurement, and track them as a group, while retaining optimality. This work presents a new graph structure that encodes these multiple-match events as standard one-to-one matches, allowing computation of the solution in polynomial time. Since identities are lost when objects merge, an efficient method to identify groups is also presented, as a flow circulation problem. The problem of tracking individual objects across groups is then posed as a standard optimal assignment. Experiments show increased performance on the PETS 2006 and 2009 datasets compared to state-of-the-art algorithms.
近年来,多目标跟踪已被表述为一个全局优化问题,并通过匈牙利算法等优化方法得到了有效的求解。一个严重的限制是无法对合并为单个度量的多个对象进行建模,并在保持最优性的同时将它们作为一个组进行跟踪。这项工作提出了一种新的图结构,将这些多匹配事件编码为标准的一对一匹配,允许在多项式时间内计算解决方案。由于对象合并时会丢失身份,因此提出了一种有效的识别组的方法,作为流循环问题。然后将跨组跟踪单个对象的问题作为标准的最优分配。实验表明,与最先进的算法相比,PETS 2006和2009数据集的性能有所提高。
{"title":"Globally optimal solution to multi-object tracking with merged measurements","authors":"João F. Henriques, Rui Caseiro, Jorge P. Batista","doi":"10.1109/ICCV.2011.6126532","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126532","url":null,"abstract":"Multiple object tracking has been formulated recently as a global optimization problem, and solved efficiently with optimal methods such as the Hungarian Algorithm. A severe limitation is the inability to model multiple objects that are merged into a single measurement, and track them as a group, while retaining optimality. This work presents a new graph structure that encodes these multiple-match events as standard one-to-one matches, allowing computation of the solution in polynomial time. Since identities are lost when objects merge, an efficient method to identify groups is also presented, as a flow circulation problem. The problem of tracking individual objects across groups is then posed as a standard optimal assignment. Experiments show increased performance on the PETS 2006 and 2009 datasets compared to state-of-the-art algorithms.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"13 1","pages":"2470-2477"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80244644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 145
Compact correlation coding for visual object categorization 用于视觉对象分类的紧凑相关编码
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126425
Nobuyuki Morioka, S. Satoh
Spatial relationships between local features are thought to play a vital role in representing object categories. However, learning a compact set of higher-order spatial features based on visual words, e.g., doublets and triplets, remains a challenging problem as possible combinations of visual words grow exponentially. While the local pairwise codebook achieves a compact codebook of pairs of spatially close local features without feature selection, its formulation is not scale invariant and is only suitable for densely sampled local features. In contrast, the proximity distribution kernel is a scale-invariant and robust representation capturing rich spatial proximity information between local features, but its representation grows quadratically in the number of visual words. Inspired by the two abovementioned techniques, this paper presents the compact correlation coding that combines the strengths of the two. Our method achieves a compact representation that is scaleinvariant and robust against object deformation. In addition, we adopt sparse coding instead of k-means clustering during the codebook construction to increase the discriminative power of our method. We systematically evaluate our method against both the local pairwise codebook and proximity distribution kernel on several challenging object categorization datasets to show performance improvements.
局部特征之间的空间关系被认为在表示对象类别方面起着至关重要的作用。然而,随着视觉词的可能组合呈指数级增长,学习基于视觉词的紧凑的高阶空间特征集仍然是一个具有挑战性的问题,例如,双联体和三联体。而局部两两码本在不进行特征选择的情况下实现了空间上紧密的局部特征对的紧凑码本,但其表述不是尺度不变的,只适用于密集采样的局部特征。邻近分布核是一种尺度不变的鲁棒表示,可捕获局部特征之间丰富的空间邻近信息,但其表示在视觉词数上呈二次增长。受上述两种技术的启发,本文提出了结合两者优点的紧凑相关编码。我们的方法实现了一种紧凑的表示,它是尺度不变的,并且对物体变形具有鲁棒性。此外,在码本构建过程中,我们采用稀疏编码代替k-means聚类来提高我们方法的判别能力。我们在几个具有挑战性的目标分类数据集上系统地评估了我们的方法对局部成对码本和邻近分布核的性能改进。
{"title":"Compact correlation coding for visual object categorization","authors":"Nobuyuki Morioka, S. Satoh","doi":"10.1109/ICCV.2011.6126425","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126425","url":null,"abstract":"Spatial relationships between local features are thought to play a vital role in representing object categories. However, learning a compact set of higher-order spatial features based on visual words, e.g., doublets and triplets, remains a challenging problem as possible combinations of visual words grow exponentially. While the local pairwise codebook achieves a compact codebook of pairs of spatially close local features without feature selection, its formulation is not scale invariant and is only suitable for densely sampled local features. In contrast, the proximity distribution kernel is a scale-invariant and robust representation capturing rich spatial proximity information between local features, but its representation grows quadratically in the number of visual words. Inspired by the two abovementioned techniques, this paper presents the compact correlation coding that combines the strengths of the two. Our method achieves a compact representation that is scaleinvariant and robust against object deformation. In addition, we adopt sparse coding instead of k-means clustering during the codebook construction to increase the discriminative power of our method. We systematically evaluate our method against both the local pairwise codebook and proximity distribution kernel on several challenging object categorization datasets to show performance improvements.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"146 1","pages":"1639-1646"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80541833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Linear dependency modeling for feature fusion 特征融合的线性依赖建模
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126477
A. J. Ma, P. Yuen
This paper addresses the independent assumption issue in fusion process. In the last decade, dependency modeling techniques were developed under a specific distribution of classifiers. This paper proposes a new framework to model the dependency between features without any assumption on feature/classifier distribution. In this paper, we prove that feature dependency can be modeled by a linear combination of the posterior probabilities under some mild assumptions. Based on the linear combination property, two methods, namely Linear Classifier Dependency Modeling (LCDM) and Linear Feature Dependency Modeling (LFDM), are derived and developed for dependency modeling in classifier level and feature level, respectively. The optimal models for LCDM and LFDM are learned by maximizing the margin between the genuine and imposter posterior probabilities. Both synthetic data and real datasets are used for experiments. Experimental results show that LFDM outperforms all existing combination methods.
本文解决了核聚变过程中的独立假设问题。在过去十年中,依赖关系建模技术是在分类器的特定分布下开发的。本文提出了一个新的框架来建模特征之间的依赖关系,而不需要假设特征/分类器的分布。在本文中,我们证明了在一些温和的假设下,特征依赖可以用后验概率的线性组合来建模。基于线性组合特性,分别推导和发展了分类器级和特征级依赖建模的线性分类器依赖建模(LCDM)和线性特征依赖建模(LFDM)方法。LCDM和LFDM的最优模型是通过最大化真实和冒牌后验概率之间的余量来学习的。实验采用了合成数据和真实数据集。实验结果表明,LFDM优于现有的组合方法。
{"title":"Linear dependency modeling for feature fusion","authors":"A. J. Ma, P. Yuen","doi":"10.1109/ICCV.2011.6126477","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126477","url":null,"abstract":"This paper addresses the independent assumption issue in fusion process. In the last decade, dependency modeling techniques were developed under a specific distribution of classifiers. This paper proposes a new framework to model the dependency between features without any assumption on feature/classifier distribution. In this paper, we prove that feature dependency can be modeled by a linear combination of the posterior probabilities under some mild assumptions. Based on the linear combination property, two methods, namely Linear Classifier Dependency Modeling (LCDM) and Linear Feature Dependency Modeling (LFDM), are derived and developed for dependency modeling in classifier level and feature level, respectively. The optimal models for LCDM and LFDM are learned by maximizing the margin between the genuine and imposter posterior probabilities. Both synthetic data and real datasets are used for experiments. Experimental results show that LFDM outperforms all existing combination methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"25 1","pages":"2041-2048"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82734205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2011 International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1