首页 > 最新文献

2011 International Conference on Computer Vision最新文献

英文 中文
Linear time offline tracking and lower envelope algorithms 线性时间离线跟踪和低包络算法
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126451
Steve Gu, Ying Zheng, Carlo Tomasi
Offline tracking of visual objects is particularly helpful in the presence of significant occlusions, when a frame-by-frame, causal tracker is likely to lose sight of the target. In addition, the trajectories found by offline tracking are typically smoother and more stable because of the global optimization this approach entails. In contrast with previous work, we show that this global optimization can be performed in O(MNT) time for T frames of video at M × N resolution, with the help of the generalized distance transform developed by Felzenszwalb and Huttenlocher [13]. Recognizing the importance of this distance transform, we extend the computation to a more general lower envelope algorithm in certain heterogeneous l1-distance metric spaces. The generalized lower envelope algorithm is of complexity O(MN(M+N)) and is useful for a more challenging offline tracking problem. Experiments show that trajectories found by offline tracking are superior to those computed by online tracking methods, and are computed at 100 frames per second.
视觉对象的离线跟踪在存在明显遮挡的情况下特别有用,因为逐帧的因果跟踪器很可能会失去对目标的视线。此外,离线跟踪找到的轨迹通常更平滑,更稳定,因为这种方法需要全局优化。与以前的工作相比,我们表明,在Felzenszwalb和Huttenlocher[13]开发的广义距离变换的帮助下,这种全局优化可以在M × N分辨率的T帧视频中在O(MNT)时间内完成。认识到这种距离变换的重要性,我们将计算扩展到更一般的下包络算法在某些异构的十一距离度量空间。广义下包络算法的复杂度为0 (MN(M+N)),适用于更具有挑战性的离线跟踪问题。实验结果表明,脱机跟踪得到的轨迹优于在线跟踪方法得到的轨迹,且计算速度为100帧/秒。
{"title":"Linear time offline tracking and lower envelope algorithms","authors":"Steve Gu, Ying Zheng, Carlo Tomasi","doi":"10.1109/ICCV.2011.6126451","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126451","url":null,"abstract":"Offline tracking of visual objects is particularly helpful in the presence of significant occlusions, when a frame-by-frame, causal tracker is likely to lose sight of the target. In addition, the trajectories found by offline tracking are typically smoother and more stable because of the global optimization this approach entails. In contrast with previous work, we show that this global optimization can be performed in O(MNT) time for T frames of video at M × N resolution, with the help of the generalized distance transform developed by Felzenszwalb and Huttenlocher [13]. Recognizing the importance of this distance transform, we extend the computation to a more general lower envelope algorithm in certain heterogeneous l1-distance metric spaces. The generalized lower envelope algorithm is of complexity O(MN(M+N)) and is useful for a more challenging offline tracking problem. Experiments show that trajectories found by offline tracking are superior to those computed by online tracking methods, and are computed at 100 frames per second.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79465099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Extracting adaptive contextual cues from unlabeled regions 从未标记区域提取适应性上下文线索
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126282
Congcong Li, Devi Parikh, Tsuhan Chen
Existing approaches to contextual reasoning for enhanced object detection typically utilize other labeled categories in the images to provide contextual information. As a consequence, they inadvertently commit to the granularity of information implicit in the labels. Moreover, large portions of the images may not belong to any of the manually-chosen categories, and these unlabeled regions are typically neglected. In this paper, we overcome both these drawbacks and propose a contextual cue that exploits unlabeled regions in images. Our approach adaptively determines the granularity (scene, inter-object, intra-object, etc.) at which contextual information is captured. In order to extract the proposed contextual cue, we consider a scene to be a structured configuration of objects and regions; just as an object is a composition of parts. We thus learn our proposed “contextual meta-objects” using any off-the-shelf object detector, which makes our proposed cue widely accessible to the community. Our results show that incorporating our proposed cue provides a relative improvement of 12% over a state-of-the-art object detector on the challenging PASCAL dataset.
现有的用于增强目标检测的上下文推理方法通常利用图像中的其他标记类别来提供上下文信息。结果是,它们不经意地向标签中隐含的信息粒度提交信息。此外,大部分图像可能不属于任何手动选择的类别,而这些未标记的区域通常被忽略。在本文中,我们克服了这两个缺点,并提出了一种利用图像中未标记区域的上下文线索。我们的方法自适应地确定捕获上下文信息的粒度(场景,对象间,对象内等)。为了提取提议的上下文线索,我们认为场景是物体和区域的结构化配置;正如一个物体是由各个部分组成的。因此,我们使用任何现成的对象检测器来学习我们提出的“上下文元对象”,这使得我们提出的线索被社区广泛访问。我们的结果表明,在具有挑战性的PASCAL数据集上,与最先进的目标检测器相比,结合我们提出的线索提供了12%的相对改进。
{"title":"Extracting adaptive contextual cues from unlabeled regions","authors":"Congcong Li, Devi Parikh, Tsuhan Chen","doi":"10.1109/ICCV.2011.6126282","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126282","url":null,"abstract":"Existing approaches to contextual reasoning for enhanced object detection typically utilize other labeled categories in the images to provide contextual information. As a consequence, they inadvertently commit to the granularity of information implicit in the labels. Moreover, large portions of the images may not belong to any of the manually-chosen categories, and these unlabeled regions are typically neglected. In this paper, we overcome both these drawbacks and propose a contextual cue that exploits unlabeled regions in images. Our approach adaptively determines the granularity (scene, inter-object, intra-object, etc.) at which contextual information is captured. In order to extract the proposed contextual cue, we consider a scene to be a structured configuration of objects and regions; just as an object is a composition of parts. We thus learn our proposed “contextual meta-objects” using any off-the-shelf object detector, which makes our proposed cue widely accessible to the community. Our results show that incorporating our proposed cue provides a relative improvement of 12% over a state-of-the-art object detector on the challenging PASCAL dataset.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83168528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Multi-view repetitive structure detection 多视图重复结构检测
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126285
Nianjuan Jiang, P. Tan, L. Cheong
Symmetry, especially repetitive structures in architecture are universally demonstrated across countries and cultures. Existing detection methods mainly focus on the detection of planar patterns from a single image. It is difficult to apply them to detect repetitive structures in architecture, which abounds with non-planar 3D repetitive elements (such as balconies and windows) and curved surfaces. We study the repetitive structure detection problem from multiple images of such architecture. Our method jointly analyzes these images and a set of 3D points reconstructed from them by structure-from-motion algorithms. 3D points help to rectify geometric deformations and hypothesize possible lattice structures, while images provide denser color and texture information to evaluate and confirm these hypotheses. In the experiments, we compare our method with existing algorithm. We also show how our results might be used to assist image-based modeling.
对称,尤其是建筑中的重复结构,在不同的国家和文化中普遍存在。现有的检测方法主要集中在从单幅图像中检测平面图案。建筑中有大量的非平面的三维重复元素(如阳台、窗户)和曲面,因此很难应用它们来检测重复结构。我们研究了这种建筑的多幅图像的重复结构检测问题。我们的方法联合分析这些图像和一组由它们重建的三维点通过运动结构算法。三维点有助于纠正几何变形并假设可能的晶格结构,而图像提供更密集的颜色和纹理信息来评估和确认这些假设。在实验中,我们将该方法与现有算法进行了比较。我们还展示了如何使用我们的结果来辅助基于图像的建模。
{"title":"Multi-view repetitive structure detection","authors":"Nianjuan Jiang, P. Tan, L. Cheong","doi":"10.1109/ICCV.2011.6126285","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126285","url":null,"abstract":"Symmetry, especially repetitive structures in architecture are universally demonstrated across countries and cultures. Existing detection methods mainly focus on the detection of planar patterns from a single image. It is difficult to apply them to detect repetitive structures in architecture, which abounds with non-planar 3D repetitive elements (such as balconies and windows) and curved surfaces. We study the repetitive structure detection problem from multiple images of such architecture. Our method jointly analyzes these images and a set of 3D points reconstructed from them by structure-from-motion algorithms. 3D points help to rectify geometric deformations and hypothesize possible lattice structures, while images provide denser color and texture information to evaluate and confirm these hypotheses. In the experiments, we compare our method with existing algorithm. We also show how our results might be used to assist image-based modeling.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87280288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Panoramic stereo video textures 全景立体视频纹理
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126376
V. Chapdelaine-Couture, M. Langer, S. Roy
A panoramic stereo (or omnistereo) pair of images provides depth information from stereo up to 360 degrees around a central observer. Because omnistereo lenses or mirrors do not yet exist, synthesizing omnistereo images requires multiple stereo camera positions and baseline orientations. Recent omnistereo methods stitch together many small field of view images called slits which are captured by one or two cameras following a circular motion. However, these methods produce omnistereo images for static scenes only. The situation is much more challenging for dynamic scenes since stitching needs to occur over both space and time and should synchronize the motion between left and right views as much as possible. This paper presents the first ever method for synthesizing panoramic stereo video textures. The method uses full frames rather than slits and uses blending across seams rather than smoothing or matching based on graph cuts. The method produces loopable panoramic stereo videos that can be displayed up to 360 degrees around a viewer.
全景立体(或全立体)图像对提供从立体到360度的深度信息,围绕中心观察者。由于全景式镜头或反射镜尚不存在,合成全景式图像需要多个立体摄像机位置和基线方向。最近的全视频方法将许多称为狭缝的小视场图像拼接在一起,这些图像由一个或两个摄像机沿着圆周运动捕获。然而,这些方法只能生成静态场景的全立体图像。对于动态场景来说,这种情况更具挑战性,因为拼接需要在空间和时间上同时发生,并且应该尽可能地同步左右视图之间的运动。本文首次提出了全景立体视频纹理的合成方法。该方法使用全帧而不是狭缝,并且在接缝之间使用混合而不是基于图形切割的平滑或匹配。该方法产生可循环的全景立体视频,可以在观看者周围360度显示。
{"title":"Panoramic stereo video textures","authors":"V. Chapdelaine-Couture, M. Langer, S. Roy","doi":"10.1109/ICCV.2011.6126376","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126376","url":null,"abstract":"A panoramic stereo (or omnistereo) pair of images provides depth information from stereo up to 360 degrees around a central observer. Because omnistereo lenses or mirrors do not yet exist, synthesizing omnistereo images requires multiple stereo camera positions and baseline orientations. Recent omnistereo methods stitch together many small field of view images called slits which are captured by one or two cameras following a circular motion. However, these methods produce omnistereo images for static scenes only. The situation is much more challenging for dynamic scenes since stitching needs to occur over both space and time and should synchronize the motion between left and right views as much as possible. This paper presents the first ever method for synthesizing panoramic stereo video textures. The method uses full frames rather than slits and uses blending across seams rather than smoothing or matching based on graph cuts. The method produces loopable panoramic stereo videos that can be displayed up to 360 degrees around a viewer.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91233935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Sorted Random Projections for robust texture classification 分类随机投影的鲁棒纹理分类
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126267
Li Liu, P. Fieguth, Gangyao Kuang, H. Zha
This paper presents a simple and highly effective system for robust texture classification, based on (1) random local features, (2) a simple global Bag-of-Words (BoW) representation, and (3) Support Vector Machines (SVMs) based classification. The key contribution in this work is to apply a sorting strategy to a universal yet information-preserving random projection (RP) technique, then comparing two different texture image representations (histograms and signatures) with various kernels in the SVMs. We have tested our texture classification system on six popular and challenging texture databases for exemplar based texture classification, comparing with 12 recent state-of-the-art methods. Experimental results show that our texture classification system yields the best classification rates of which we are aware of 99.37% for CUReT, 97.16% for Brodatz, 99.30% for UMD and 99.29% for KTH-TIPS. Moreover, combining random features significantly outperforms the state-of-the-art descriptors in material categorization.
本文提出了一种简单而高效的鲁棒纹理分类系统,该系统基于(1)随机局部特征,(2)简单的全局词袋(BoW)表示,以及(3)基于支持向量机(svm)的分类。本工作的关键贡献是将排序策略应用于通用且保持信息的随机投影(RP)技术,然后比较支持向量机中不同核的两种不同纹理图像表示(直方图和签名)。我们在6个流行且具有挑战性的纹理数据库上测试了我们的纹理分类系统,并与12种最新的最先进的纹理分类方法进行了比较。实验结果表明,我们的纹理分类系统的分类率最高,其中CUReT分类率为99.37%,Brodatz分类率为97.16%,UMD分类率为99.30%,KTH-TIPS分类率为99.29%。此外,结合随机特征在材料分类中显著优于最先进的描述符。
{"title":"Sorted Random Projections for robust texture classification","authors":"Li Liu, P. Fieguth, Gangyao Kuang, H. Zha","doi":"10.1109/ICCV.2011.6126267","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126267","url":null,"abstract":"This paper presents a simple and highly effective system for robust texture classification, based on (1) random local features, (2) a simple global Bag-of-Words (BoW) representation, and (3) Support Vector Machines (SVMs) based classification. The key contribution in this work is to apply a sorting strategy to a universal yet information-preserving random projection (RP) technique, then comparing two different texture image representations (histograms and signatures) with various kernels in the SVMs. We have tested our texture classification system on six popular and challenging texture databases for exemplar based texture classification, comparing with 12 recent state-of-the-art methods. Experimental results show that our texture classification system yields the best classification rates of which we are aware of 99.37% for CUReT, 97.16% for Brodatz, 99.30% for UMD and 99.29% for KTH-TIPS. Moreover, combining random features significantly outperforms the state-of-the-art descriptors in material categorization.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91302123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
Center-surround divergence of feature statistics for salient object detection 显著目标检测的中心-环绕发散特征统计
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126499
D. A. Klein, S. Frintrop
In this paper, we introduce a new method to detect salient objects in images. The approach is based on the standard structure of cognitive visual attention models, but realizes the computation of saliency in each feature dimension in an information-theoretic way. The method allows a consistent computation of all feature channels and a well-founded fusion of these channels to a saliency map. Our framework enables the computation of arbitrarily scaled features and local center-surround pairs in an efficient manner. We show that our approach outperforms eight state-of-the-art saliency detectors in terms of precision and recall.
本文介绍了一种检测图像中显著目标的新方法。该方法以认知视觉注意模型的标准结构为基础,以信息论的方式实现了各特征维度的显著性计算。该方法允许对所有特征通道进行一致的计算,并将这些通道充分融合到显著性图中。我们的框架能够以有效的方式计算任意缩放的特征和局部中心-环绕对。我们表明,我们的方法优于八个最先进的显著性检测器在精度和召回。
{"title":"Center-surround divergence of feature statistics for salient object detection","authors":"D. A. Klein, S. Frintrop","doi":"10.1109/ICCV.2011.6126499","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126499","url":null,"abstract":"In this paper, we introduce a new method to detect salient objects in images. The approach is based on the standard structure of cognitive visual attention models, but realizes the computation of saliency in each feature dimension in an information-theoretic way. The method allows a consistent computation of all feature channels and a well-founded fusion of these channels to a saliency map. Our framework enables the computation of arbitrarily scaled features and local center-surround pairs in an efficient manner. We show that our approach outperforms eight state-of-the-art saliency detectors in terms of precision and recall.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89686725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 362
iGroup: Weakly supervised image and video grouping iggroup:弱监督图像和视频分组
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126493
Andrew Gilbert, R. Bowden
We present a generic, efficient and iterative algorithm for interactively clustering classes of images and videos. The approach moves away from the use of large hand labelled training datasets, instead allowing the user to find natural groups of similar content based upon a handful of “seed” examples. Two efficient data mining tools originally developed for text analysis; min-Hash and APriori are used and extended to achieve both speed and scalability on large image and video datasets. Inspired by the Bag-of-Words (BoW) architecture, the idea of an image signature is introduced as a simple descriptor on which nearest neighbour classification can be performed. The image signature is then dynamically expanded to identify common features amongst samples of the same class. The iterative approach uses APriori to identify common and distinctive elements of a small set of labelled true and false positive signatures. These elements are then accentuated in the signature to increase similarity between examples and “pull” positive classes together. By repeating this process, the accuracy of similarity increases dramatically despite only a few training examples, only 10% of the labelled groundtruth is needed, compared to other approaches. It is tested on two image datasets including the caltech101 [9] dataset and on three state-of-the-art action recognition datasets. On the YouTube [18] video dataset the accuracy increases from 72% to 97% using only 44 labelled examples from a dataset of over 1200 videos. The approach is both scalable and efficient, with an iteration on the full YouTube dataset taking around 1 minute on a standard desktop machine.
提出了一种通用、高效、迭代的图像和视频类交互聚类算法。该方法不再使用大型手工标记的训练数据集,而是允许用户基于少数“种子”示例找到相似内容的自然组。最初为文本分析开发的两种高效数据挖掘工具;使用和扩展min-Hash和APriori来实现大型图像和视频数据集的速度和可扩展性。受词袋(BoW)体系结构的启发,引入了图像签名的思想,作为一个简单的描述符,可以在其上执行最近邻分类。然后动态扩展图像签名以识别同一类样本之间的共同特征。迭代方法使用APriori来识别一小组标记的真阳性和假阳性签名的共同和独特元素。然后在签名中强调这些元素,以增加示例之间的相似性,并将正类“拉”在一起。通过重复这一过程,尽管只有少数训练样本,但相似性的准确性显著提高,与其他方法相比,只需要标记的基础事实的10%。它在两个图像数据集(包括caltech101[9]数据集)和三个最先进的动作识别数据集上进行了测试。在YouTube[18]视频数据集上,仅使用来自1200多个视频数据集的44个标记示例,准确率从72%提高到97%。该方法既可扩展又高效,在标准桌面机器上对整个YouTube数据集进行迭代大约需要1分钟。
{"title":"iGroup: Weakly supervised image and video grouping","authors":"Andrew Gilbert, R. Bowden","doi":"10.1109/ICCV.2011.6126493","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126493","url":null,"abstract":"We present a generic, efficient and iterative algorithm for interactively clustering classes of images and videos. The approach moves away from the use of large hand labelled training datasets, instead allowing the user to find natural groups of similar content based upon a handful of “seed” examples. Two efficient data mining tools originally developed for text analysis; min-Hash and APriori are used and extended to achieve both speed and scalability on large image and video datasets. Inspired by the Bag-of-Words (BoW) architecture, the idea of an image signature is introduced as a simple descriptor on which nearest neighbour classification can be performed. The image signature is then dynamically expanded to identify common features amongst samples of the same class. The iterative approach uses APriori to identify common and distinctive elements of a small set of labelled true and false positive signatures. These elements are then accentuated in the signature to increase similarity between examples and “pull” positive classes together. By repeating this process, the accuracy of similarity increases dramatically despite only a few training examples, only 10% of the labelled groundtruth is needed, compared to other approaches. It is tested on two image datasets including the caltech101 [9] dataset and on three state-of-the-art action recognition datasets. On the YouTube [18] video dataset the accuracy increases from 72% to 97% using only 44 labelled examples from a dataset of over 1200 videos. The approach is both scalable and efficient, with an iteration on the full YouTube dataset taking around 1 minute on a standard desktop machine.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90265761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Discriminative learning of relaxed hierarchy for large-scale visual recognition 大规模视觉识别中放松层次的判别学习
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126481
Tianshi Gao, D. Koller
In the real visual world, the number of categories a classifier needs to discriminate is on the order of hundreds or thousands. For example, the SUN dataset [24] contains 899 scene categories and ImageNet [6] has 15,589 synsets. Designing a multiclass classifier that is both accurate and fast at test time is an extremely important problem in both machine learning and computer vision communities. To achieve a good trade-off between accuracy and speed, we adopt the relaxed hierarchy structure from [15], where a set of binary classifiers are organized in a tree or DAG (directed acyclic graph) structure. At each node, classes are colored into positive and negative groups which are separated by a binary classifier while a subset of confusing classes is ignored. We color the classes and learn the induced binary classifier simultaneously using a unified and principled max-margin optimization. We provide an analysis on generalization error to justify our design. Our method has been tested on both Caltech-256 (object recognition) [9] and the SUN dataset (scene classification) [24], and shows significant improvement over existing methods.
在真实的视觉世界中,分类器需要区分的类别数量大约是数百或数千个。例如,SUN数据集[24]包含899个场景类别,而ImageNet[6]有15,589个同义词集。在机器学习和计算机视觉领域,设计一个在测试时既准确又快速的多类分类器是一个极其重要的问题。为了在精度和速度之间实现良好的权衡,我们采用了[15]中的宽松层次结构,其中一组二分类器被组织成树或DAG(有向无环图)结构。在每个节点上,类被分为正类和负类,它们被二元分类器分开,而混淆类的子集被忽略。我们使用统一的、原则性的最大边际优化方法对类进行上色,同时学习归纳二分类器。我们提供了一个泛化误差分析来证明我们的设计。我们的方法已经在Caltech-256(物体识别)[9]和SUN数据集(场景分类)[24]上进行了测试,并显示出比现有方法有显著改进。
{"title":"Discriminative learning of relaxed hierarchy for large-scale visual recognition","authors":"Tianshi Gao, D. Koller","doi":"10.1109/ICCV.2011.6126481","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126481","url":null,"abstract":"In the real visual world, the number of categories a classifier needs to discriminate is on the order of hundreds or thousands. For example, the SUN dataset [24] contains 899 scene categories and ImageNet [6] has 15,589 synsets. Designing a multiclass classifier that is both accurate and fast at test time is an extremely important problem in both machine learning and computer vision communities. To achieve a good trade-off between accuracy and speed, we adopt the relaxed hierarchy structure from [15], where a set of binary classifiers are organized in a tree or DAG (directed acyclic graph) structure. At each node, classes are colored into positive and negative groups which are separated by a binary classifier while a subset of confusing classes is ignored. We color the classes and learn the induced binary classifier simultaneously using a unified and principled max-margin optimization. We provide an analysis on generalization error to justify our design. Our method has been tested on both Caltech-256 (object recognition) [9] and the SUN dataset (scene classification) [24], and shows significant improvement over existing methods.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73044785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 181
Image representation by active curves 用活动曲线表示图像
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126447
Wenze Hu, Y. Wu, Song-Chun Zhu
This paper proposes a sparse image representation using deformable templates of simple geometric structures that are commonly observed in images of natural scenes. These deformable templates include active curve templates and active corner templates. An active curve template is a composition of Gabor wavelet elements placed with equal spacing on a straight line segment or a circular arc segment of constant curvature, where each Gabor wavelet element is allowed to locally shift its location and orientation, so that the original line and arc segment of the active curve template can be deformed to fit the observed image. An active corner or angle template is a composition of two active curve templates that share a common end point, and the active curve templates are allowed to vary their overall lengths and curvatures, so that the original corner template can deform to match the observed image. This paper then proposes a hierarchical computational architecture of summax maps that pursues a sparse representation of an image by selecting a small number of active curve and corner templates from a dictionary of all such templates. Experiments show that the proposed method is capable of finding sparse representations of natural images. It is also shown that object templates can be learned by selecting and composing active curve and corner templates.
本文提出了一种稀疏图像表示方法,使用自然场景图像中常见的简单几何结构的可变形模板。这些可变形模板包括活动曲线模板和活动角模板。活动曲线模板是由等距放置在恒定曲率的直线段或圆弧段上的Gabor小波元素组成,允许每个Gabor小波元素局部移动其位置和方向,从而使活动曲线模板的原始直线和圆弧段变形以拟合观测图像。活动角模板或角度模板是由两个共享一个共同端点的活动曲线模板组成的,活动曲线模板允许改变其总长度和曲率,从而使原始角模板能够变形以匹配观测图像。然后,本文提出了一种summax地图的分层计算架构,该架构通过从所有这些模板的字典中选择少量的活动曲线和角模板来追求图像的稀疏表示。实验表明,该方法能够找到自然图像的稀疏表示。通过选择和组合活动的曲线和角模板,可以学习对象模板。
{"title":"Image representation by active curves","authors":"Wenze Hu, Y. Wu, Song-Chun Zhu","doi":"10.1109/ICCV.2011.6126447","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126447","url":null,"abstract":"This paper proposes a sparse image representation using deformable templates of simple geometric structures that are commonly observed in images of natural scenes. These deformable templates include active curve templates and active corner templates. An active curve template is a composition of Gabor wavelet elements placed with equal spacing on a straight line segment or a circular arc segment of constant curvature, where each Gabor wavelet element is allowed to locally shift its location and orientation, so that the original line and arc segment of the active curve template can be deformed to fit the observed image. An active corner or angle template is a composition of two active curve templates that share a common end point, and the active curve templates are allowed to vary their overall lengths and curvatures, so that the original corner template can deform to match the observed image. This paper then proposes a hierarchical computational architecture of summax maps that pursues a sparse representation of an image by selecting a small number of active curve and corner templates from a dictionary of all such templates. Experiments show that the proposed method is capable of finding sparse representations of natural images. It is also shown that object templates can be learned by selecting and composing active curve and corner templates.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73575825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A linear subspace learning approach via sparse coding 一种基于稀疏编码的线性子空间学习方法
Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126313
Lei Zhang, Peng Fei Zhu, Q. Hu, D. Zhang
Linear subspace learning (LSL) is a popular approach to image recognition and it aims to reveal the essential features of high dimensional data, e.g., facial images, in a lower dimensional space by linear projection. Most LSL methods compute directly the statistics of original training samples to learn the subspace. However, these methods do not effectively exploit the different contributions of different image components to image recognition. We propose a novel LSL approach by sparse coding and feature grouping. A dictionary is learned from the training dataset, and it is used to sparsely decompose the training samples. The decomposed image components are grouped into a more discriminative part (MDP) and a less discriminative part (LDP). An unsupervised criterion and a supervised criterion are then proposed to learn the desired subspace, where the MDP is preserved and the LDP is suppressed simultaneously. The experimental results on benchmark face image databases validated that the proposed methods outperform many state-of-the-art LSL schemes.
线性子空间学习(LSL)是一种流行的图像识别方法,它旨在通过线性投影在较低维空间中揭示高维数据(如面部图像)的基本特征。大多数LSL方法直接计算原始训练样本的统计量来学习子空间。然而,这些方法并没有有效地利用不同图像成分对图像识别的不同贡献。本文提出了一种基于稀疏编码和特征分组的LSL方法。从训练数据集中学习字典,并使用字典对训练样本进行稀疏分解。将分解后的图像分量分为判别性较强的部分(MDP)和判别性较弱的部分(LDP)。然后提出了一个无监督准则和一个监督准则来学习期望的子空间,其中MDP被保留,LDP被抑制。在基准人脸图像数据库上的实验结果验证了所提出的方法优于许多最先进的LSL方案。
{"title":"A linear subspace learning approach via sparse coding","authors":"Lei Zhang, Peng Fei Zhu, Q. Hu, D. Zhang","doi":"10.1109/ICCV.2011.6126313","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126313","url":null,"abstract":"Linear subspace learning (LSL) is a popular approach to image recognition and it aims to reveal the essential features of high dimensional data, e.g., facial images, in a lower dimensional space by linear projection. Most LSL methods compute directly the statistics of original training samples to learn the subspace. However, these methods do not effectively exploit the different contributions of different image components to image recognition. We propose a novel LSL approach by sparse coding and feature grouping. A dictionary is learned from the training dataset, and it is used to sparsely decompose the training samples. The decomposed image components are grouped into a more discriminative part (MDP) and a less discriminative part (LDP). An unsupervised criterion and a supervised criterion are then proposed to learn the desired subspace, where the MDP is preserved and the LDP is suppressed simultaneously. The experimental results on benchmark face image databases validated that the proposed methods outperform many state-of-the-art LSL schemes.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73745128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
期刊
2011 International Conference on Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1