2011 International Conference on Computer Vision最新文献

英文中文

Multiclass recognition and part localization with humans in the loop 多类识别和部分定位与人的循环

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126539

C. Wah, Steve Branson, P. Perona, Serge J. Belongie

We propose a visual recognition system that is designed for fine-grained visual categorization. The system is composed of a machine and a human user. The user, who is unable to carry out the recognition task by himself, is interactively asked to provide two heterogeneous forms of information: clicking on object parts and answering binary questions. The machine intelligently selects the most informative question to pose to the user in order to identify the object's class as quickly as possible. By leveraging computer vision and analyzing the user responses, the overall amount of human effort required, measured in seconds, is minimized. We demonstrate promising results on a challenging dataset of uncropped images, achieving a significant average reduction in human effort over previous methods.

我们提出了一个专为细粒度视觉分类设计的视觉识别系统。该系统由一台机器和一个用户组成。用户无法自己完成识别任务，交互地要求用户提供两种异构形式的信息:点击对象部分和回答二进制问题。机器智能地选择最具信息量的问题向用户提出，以便尽快识别对象的类别。通过利用计算机视觉和分析用户响应，所需的人力总工作量(以秒为单位)被最小化。我们在一个具有挑战性的未裁剪图像数据集上展示了有希望的结果，与以前的方法相比，实现了显著的人力平均减少。

引用次数: 188

Discovering favorite views of popular places with iconoid shift 发现最喜欢的观点，受欢迎的地方与象形变换

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126361

Tobias Weyand, B. Leibe

In this paper, we propose a novel algorithm for automatic landmark building discovery in large, unstructured image collections. In contrast to other approaches which aim at a hard clustering, we regard the task as a mode estimation problem. Our algorithm searches for local attractors in the image distribution that have a maximal mutual homography overlap with the images in their neighborhood. Those attractors correspond to central, iconic views of single objects or buildings, which we efficiently extract using a medoid shift search with a novel distance measure. We propose efficient algorithms for performing this search. Most importantly, our approach performs only an efficient local exploration of the matching graph that makes it applicable for large-scale analysis of photo collections. We show experimental results validating our approach on a dataset of 500k images of the inner city of Paris.

在本文中，我们提出了一种在大型非结构化图像集合中自动发现地标建筑的新算法。与其他以硬聚类为目标的方法不同，我们将该任务视为一个模式估计问题。我们的算法在图像分布中搜索与其邻域图像具有最大互单应性重叠的局部吸引子。这些吸引子对应于单个物体或建筑物的中心、标志性视图，我们使用带有新距离度量的介质移位搜索有效地提取这些视图。我们提出了执行这种搜索的有效算法。最重要的是，我们的方法只对匹配图进行有效的局部探索，这使得它适用于照片集的大规模分析。我们展示了实验结果，验证了我们在巴黎内城50万张图像数据集上的方法。

引用次数: 51

Multiscale, curvature-based shape representation for surfaces 曲面的多尺度、基于曲率的形状表示

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126457

Ruirui Jiang, X. Gu

This paper presents a multiscale, curvature-based shape representation technique for general genus zero closed surfaces. The method is invariant under rotation, translation, scaling, or general isometric deformations; it is robust to noise and preserves intrinsic symmetry. The method is a direct generalization of the Curvature Scale Space (CSS) shape descriptor for planar curves. In our method, the Riemannian metric of the surface is deformed under Ricci flow, such that the Gaussian curvature evolves according to a heat diffusion process. Eventually the surface becomes a sphere with constant positive curvature everywhere. The evolution of zero curvature curves on the surface is utilized as the shape descriptor. Our experimental results on a 3D geometric database with about 80 shapes demonstrate the efficiency and efficacy of the method.

本文提出了一种多尺度、基于曲率的一般零属封闭曲面形状表示方法。该方法在旋转、平移、缩放或一般等距变形下是不变的;它对噪声具有鲁棒性，并保持了固有的对称性。该方法是平面曲线曲率尺度空间(CSS)形状描述符的直接推广。在我们的方法中，表面的黎曼度规在里奇流下变形，使得高斯曲率根据热扩散过程演变。最终，这个表面变成了一个处处都有恒定正曲率的球体。利用零曲率曲线在曲面上的演化作为形状描述符。在约80个几何形状的三维几何数据库上的实验结果证明了该方法的有效性。

引用次数: 6

HMDB: A large video database for human motion recognition HMDB:用于人体动作识别的大型视频数据库

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126543

Hilde Kuehne, Hueihan Jhuang, Estíbaliz Garrote, T. Poggio, Thomas Serre

With nearly one billion online videos viewed everyday, an emerging new frontier in computer vision research is recognition and search in video. While much effort has been devoted to the collection and annotation of large scalable static image datasets containing thousands of image categories, human action datasets lag far behind. Current action recognition databases contain on the order of ten different action categories collected under fairly controlled conditions. State-of-the-art performance on these datasets is now near ceiling and thus there is a need for the design and creation of new benchmarks. To address this issue we collected the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube. We use this database to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions such as camera motion, viewpoint, video quality and occlusion.

每天有近10亿在线视频被观看，计算机视觉研究的一个新兴前沿是视频的识别和搜索。虽然在包含数千个图像类别的大型可扩展静态图像数据集的收集和注释方面已经投入了大量的努力，但人类动作数据集远远落后。当前的动作识别数据库包含在相当控制的条件下收集的大约十种不同的动作类别。这些数据集的最先进性能现在接近上限，因此需要设计和创建新的基准。为了解决这个问题，我们收集了迄今为止最大的动作视频数据库，其中包含51个动作类别，总共包含大约7,000个手动注释的剪辑，这些剪辑来自各种来源，从数字化电影到YouTube。我们使用该数据库评估了两个具有代表性的计算机视觉系统在动作识别方面的性能，并探讨了这些方法在摄像机运动、视点、视频质量和遮挡等各种条件下的鲁棒性。

引用次数: 3246

Spatiotemporal oriented energies for spacetime stereo 时空立体的时空定向能量

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126362

Mikhail Sizintsev, Richard P. Wildes

This paper presents a novel approach to recovering temporally coherent estimates of 3D structure of a dynamic scene from a sequence of binocular stereo images. The approach is based on matching spatiotemporal orientation distributions between left and right temporal image streams, which encapsulates both local spatial and temporal structure for disparity estimation. By capturing spatial and temporal structure in this unified fashion, both sources of information combine to yield disparity estimates that are naturally temporal coherent, while helping to resolve matches that might be ambiguous when either source is considered alone. Further, by allowing subsets of the orientation measurements to support different disparity estimates, an approach to recovering multilayer disparity from spacetime stereo is realized. The approach has been implemented with real-time performance on commodity GPUs. Empirical evaluation shows that the approach yields qualitatively and quantitatively superior disparity estimates in comparison to various alternative approaches, including the ability to provide accurate multilayer estimates in the presence of (semi)transparent and specular surfaces.

本文提出了一种从双目立体图像序列中恢复动态场景三维结构的时间相干估计的新方法。该方法基于匹配左右时间图像流之间的时空方向分布，封装了局部时空结构，用于视差估计。通过以这种统一的方式捕获空间和时间结构，两个信息来源结合起来产生自然时间连贯的差异估计，同时帮助解决单独考虑任何一个来源时可能含糊不清的匹配。此外，通过允许方向测量的子集支持不同的视差估计，实现了从时空立体中恢复多层视差的方法。该方法已在商用gpu上实现了实时性。经验评估表明，与各种替代方法相比，该方法在质量和数量上都产生了更好的视差估计，包括在(半)透明和镜面存在的情况下提供准确的多层估计的能力。

{"title":"Spatiotemporal oriented energies for spacetime stereo","authors":"Mikhail Sizintsev, Richard P. Wildes","doi":"10.1109/ICCV.2011.6126362","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126362","url":null,"abstract":"This paper presents a novel approach to recovering temporally coherent estimates of 3D structure of a dynamic scene from a sequence of binocular stereo images. The approach is based on matching spatiotemporal orientation distributions between left and right temporal image streams, which encapsulates both local spatial and temporal structure for disparity estimation. By capturing spatial and temporal structure in this unified fashion, both sources of information combine to yield disparity estimates that are naturally temporal coherent, while helping to resolve matches that might be ambiguous when either source is considered alone. Further, by allowing subsets of the orientation measurements to support different disparity estimates, an approach to recovering multilayer disparity from spacetime stereo is realized. The approach has been implemented with real-time performance on commodity GPUs. Empirical evaluation shows that the approach yields qualitatively and quantitatively superior disparity estimates in comparison to various alternative approaches, including the ability to provide accurate multilayer estimates in the presence of (semi)transparent and specular surfaces.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"50 1","pages":"1140-1147"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76205513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

From contours to 3D object detection and pose estimation 从轮廓到3D物体检测和姿态估计

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126342

Nadia Payet, S. Todorovic

This paper addresses view-invariant object detection and pose estimation from a single image. While recent work focuses on object-centered representations of point-based object features, we revisit the viewer-centered framework, and use image contours as basic features. Given training examples of arbitrary views of an object, we learn a sparse object model in terms of a few view-dependent shape templates. The shape templates are jointly used for detecting object occurrences and estimating their 3D poses in a new image. Instrumental to this is our new mid-level feature, called bag of boundaries (BOB), aimed at lifting from individual edges toward their more informative summaries for identifying object boundaries amidst the background clutter. In inference, BOBs are placed on deformable grids both in the image and the shape templates, and then matched. This is formulated as a convex optimization problem that accommodates invariance to non-rigid, locally affine shape deformations. Evaluation on benchmark datasets demonstrates our competitive results relative to the state of the art.

本文研究了单幅图像的视觉不变目标检测和姿态估计。虽然最近的工作集中在基于点的对象特征的以对象为中心的表示上，但我们重新审视了以观众为中心的框架，并使用图像轮廓作为基本特征。给定对象的任意视图的训练示例，我们根据一些视图相关的形状模板学习稀疏对象模型。形状模板共同用于检测物体的出现并估计其在新图像中的三维姿态。在这方面，我们的新中级功能，称为边界袋(BOB)，旨在从单个边缘提升到更有信息的总结，以识别背景混乱中的物体边界。在推理中，将bob放置在图像和形状模板中的可变形网格上，然后进行匹配。这被表述为一个凸优化问题，它适应非刚性的不变性，局部仿射形状变形。对基准数据集的评估显示了我们相对于当前技术水平的竞争结果。

引用次数: 139

Understanding egocentric activities 理解自我中心活动

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126269

A. Fathi, Ali Farhadi, James M. Rehg

We present a method to analyze daily activities, such as meal preparation, using video from an egocentric camera. Our method performs inference about activities, actions, hands, and objects. Daily activities are a challenging domain for activity recognition which are well-suited to an egocentric approach. In contrast to previous activity recognition methods, our approach does not require pre-trained detectors for objects and hands. Instead we demonstrate the ability to learn a hierarchical model of an activity by exploiting the consistent appearance of objects, hands, and actions that results from the egocentric context. We show that joint modeling of activities, actions, and objects leads to superior performance in comparison to the case where they are considered independently. We introduce a novel representation of actions based on object-hand interactions and experimentally demonstrate the superior performance of our representation in comparison to standard activity representations such as bag of words.

我们提出了一种方法来分析日常活动，如膳食准备，使用视频从自我中心相机。我们的方法执行关于活动、动作、手和对象的推理。日常活动是活动识别的一个具有挑战性的领域，非常适合以自我为中心的方法。与以前的活动识别方法相比，我们的方法不需要预先训练物体和手的检测器。相反，我们展示了通过利用来自自我中心环境的物体、手和动作的一致外观来学习活动分层模型的能力。我们表明，与独立考虑活动、动作和对象的情况相比，活动、动作和对象的联合建模可以带来更好的性能。我们引入了一种新的基于对象-手交互的动作表示，并通过实验证明了与标准活动表示(如单词袋)相比，我们的表示具有优越的性能。

引用次数: 400

Weakly supervised object detector learning with model drift detection 弱监督目标检测器学习与模型漂移检测

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126261

P. Siva, T. Xiang

A conventional approach to learning object detectors uses fully supervised learning techniques which assumes that a training image set with manual annotation of object bounding boxes are provided. The manual annotation of objects in large image sets is tedious and unreliable. Therefore, a weakly supervised learning approach is desirable, where the training set needs only binary labels regarding whether an image contains the target object class. In the weakly supervised approach a detector is used to iteratively annotate the training set and learn the object model. We present a novel weakly supervised learning framework for learning an object detector. Our framework incorporates a new initial annotation model to start the iterative learning of a detector and a model drift detection method that is able to detect and stop the iterative learning when the detector starts to drift away from the objects of interest. We demonstrate the effectiveness of our approach on the challenging PASCAL 2007 dataset.

学习目标检测器的传统方法使用完全监督学习技术，该技术假设提供了具有手动标注目标边界框的训练图像集。在大图像集中对目标进行手工标注是一种繁琐且不可靠的方法。因此，弱监督学习方法是可取的，其中训练集只需要关于图像是否包含目标对象类的二值标签。在弱监督方法中，使用检测器迭代地标注训练集并学习目标模型。提出了一种新的弱监督学习框架，用于学习目标检测器。我们的框架结合了一个新的初始注释模型来启动检测器的迭代学习，以及一个模型漂移检测方法，该方法能够在检测器开始偏离感兴趣的对象时检测并停止迭代学习。我们在具有挑战性的PASCAL 2007数据集上展示了我们的方法的有效性。

引用次数: 160

Latent Low-Rank Representation for subspace segmentation and feature extraction 子空间分割和特征提取的潜在低秩表示

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126422

Guangcan Liu, Shuicheng Yan

Low-Rank Representation (LRR) [16, 17] is an effective method for exploring the multiple subspace structures of data. Usually, the observed data matrix itself is chosen as the dictionary, which is a key aspect of LRR. However, such a strategy may depress the performance, especially when the observations are insufficient and/or grossly corrupted. In this paper we therefore propose to construct the dictionary by using both observed and unobserved, hidden data. We show that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently. The formulation of the proposed method, called Latent Low-Rank Representation (LatLRR), seamlessly integrates subspace segmentation and feature extraction into a unified framework, and thus provides us with a solution for both subspace segmentation and feature extraction. As a subspace segmentation algorithm, LatLRR is an enhanced version of LRR and outperforms the state-of-the-art algorithms. Being an unsupervised feature extraction algorithm, LatLRR is able to robustly extract salient features from corrupted data, and thus can work much better than the benchmark that utilizes the original data vectors as features for classification. Compared to dimension reduction based methods, LatLRR is more robust to noise.

低秩表示(Low-Rank Representation, LRR)[16,17]是一种探索数据的多个子空间结构的有效方法。通常选择观测数据矩阵本身作为字典，这是LRR的一个关键方面。然而，这种策略可能会降低性能，特别是当观察结果不充分和/或严重损坏时。因此，在本文中，我们建议使用观察到的和未观察到的隐藏数据来构建字典。我们证明了通过求解一个核范数最小化问题可以近似地恢复隐藏数据的影响，这个核范数最小化问题是凸的，可以有效地求解。所提出的方法称为Latent Low-Rank Representation (LatLRR)，它将子空间分割和特征提取无缝地集成到一个统一的框架中，从而为子空间分割和特征提取提供了一个解决方案。作为一种子空间分割算法，LatLRR是LRR的增强版，其性能优于现有的子空间分割算法。作为一种无监督特征提取算法，LatLRR能够鲁棒地从损坏的数据中提取显著特征，因此比利用原始数据向量作为特征进行分类的基准要好得多。与基于降维的方法相比，LatLRR对噪声具有更强的鲁棒性。

{"title":"Latent Low-Rank Representation for subspace segmentation and feature extraction","authors":"Guangcan Liu, Shuicheng Yan","doi":"10.1109/ICCV.2011.6126422","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126422","url":null,"abstract":"Low-Rank Representation (LRR) [16, 17] is an effective method for exploring the multiple subspace structures of data. Usually, the observed data matrix itself is chosen as the dictionary, which is a key aspect of LRR. However, such a strategy may depress the performance, especially when the observations are insufficient and/or grossly corrupted. In this paper we therefore propose to construct the dictionary by using both observed and unobserved, hidden data. We show that the effects of the hidden data can be approximately recovered by solving a nuclear norm minimization problem, which is convex and can be solved efficiently. The formulation of the proposed method, called Latent Low-Rank Representation (LatLRR), seamlessly integrates subspace segmentation and feature extraction into a unified framework, and thus provides us with a solution for both subspace segmentation and feature extraction. As a subspace segmentation algorithm, LatLRR is an enhanced version of LRR and outperforms the state-of-the-art algorithms. Being an unsupervised feature extraction algorithm, LatLRR is able to robustly extract salient features from corrupted data, and thus can work much better than the benchmark that utilizes the original data vectors as features for classification. Compared to dimension reduction based methods, LatLRR is more robust to noise.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"75 1","pages":"1615-1622"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79185504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 597

Cluster-based color space optimizations 基于集群的色彩空间优化

2011 International Conference on Computer Vision

Pub Date : 2011-11-06 DOI: 10.1109/ICCV.2011.6126366

Cheryl Lau, W. Heidrich, Rafał K. Mantiuk

Transformations between different color spaces and gamuts are ubiquitous operations performed on images. Often, these transformations involve information loss, for example when mapping from color to grayscale for printing, from multispectral or multiprimary data to tristimulus spaces, or from one color gamut to another. In all these applications, there exists a straightforward “natural” mapping from the source space to the target space, but the mapping is not bijective, resulting in information loss due to metamerism and similar effects. We propose a cluster-based approach for optimizing the transformation for individual images in a way that preserves as much of the information as possible from the source space while staying as faithful as possible to the natural mapping. Our approach can be applied to a host of color transformation problems including color to gray, gamut mapping, conversion of multispectral and multiprimary data to tristimulus colors, and image optimization for color deficient viewers.

不同颜色空间和色域之间的转换是在图像上执行的普遍操作。通常，这些转换涉及信息丢失，例如，从彩色映射到灰度打印，从多光谱或多原色数据映射到三刺激空间，或从一个色域映射到另一个色域。在所有这些应用中，都存在从源空间到目标空间的直接的“自然”映射，但这种映射不是双射的，导致由于同质性和类似效果而导致的信息丢失。我们提出了一种基于集群的方法来优化单个图像的转换，以尽可能多地保留来自源空间的信息，同时尽可能忠实于自然映射。我们的方法可以应用于许多颜色转换问题，包括颜色到灰度、色域映射、多光谱和多原色数据到三刺激颜色的转换，以及针对色差观众的图像优化。

引用次数: 58

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 International Conference on Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀