2009 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文中文

Nonparametric scene parsing: Label transfer via dense scene alignment 非参数场景解析:通过密集场景对齐进行标签转移

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206536

Ce Liu, Jenny Yuen, A. Torralba

In this paper we propose a novel nonparametric approach for object recognition and scene parsing using dense scene alignment. Given an input image, we retrieve its best matches from a large database with annotated images using our modified, coarse-to-fine SIFT flow algorithm that aligns the structures within two images. Based on the dense scene correspondence obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on a challenging database. Compared to existing object recognition approaches that require training for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

本文提出了一种基于密集场景对齐的非参数目标识别和场景分析方法。给定输入图像，我们使用改进的粗到细的SIFT流算法从带有注释图像的大型数据库中检索其最佳匹配项，该算法将两幅图像中的结构对齐。基于从SIFT流中获得的密集场景对应关系，我们的系统扭曲了现有的注释，并在马尔可夫随机场框架中集成多个线索来分割和识别查询图像。我们的非参数场景分析系统在具有挑战性的数据库上取得了良好的实验结果。与现有的需要对每个对象类别进行训练的对象识别方法相比，我们的系统易于实现，参数很少，并且在检索/对齐过程中自然地嵌入上下文信息。

引用次数: 361

Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive Basin Hopping Monte Carlo sampling 基于斑块的动态外观建模和自适应盆地跳蒙特卡罗采样的非刚性物体跟踪

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206502

Junseok Kwon, Kyoung Mu Lee

We propose a novel tracking algorithm for the target of which geometric appearance changes drastically over time. To track it, we present a local patch-based appearance model and provide an efficient scheme to evolve the topology between local patches by on-line update. In the process of on-line update, the robustness of each patch in the model is estimated by a new method of measurement which analyzes the landscape of local mode of the patch. This patch can be moved, deleted or newly added, which gives more flexibility to the model. Additionally, we introduce the Basin Hopping Monte Carlo (BHMC) sampling method to our tracking problem to reduce the computational complexity and deal with the problem of getting trapped in local minima. The BHMC method makes it possible for our appearance model to consist of enough numbers of patches. Since BHMC uses the same local optimizer that is used in the appearance modeling, it can be efficiently integrated into our tracking framework. Experimental results show that our approach tracks the object whose geometric appearance is drastically changing, accurately and robustly.

我们提出了一种新的目标几何形状随时间急剧变化的跟踪算法。为了对其进行跟踪，我们提出了一种基于局部补丁的外观模型，并提供了一种通过在线更新在局部补丁之间进化拓扑的有效方案。在在线更新过程中，通过一种新的测量方法来估计模型中每个补丁的鲁棒性，该方法分析了补丁的局部模式的景观。这个补丁可以移动，删除或新增，这给了模型更多的灵活性。此外，我们在跟踪问题中引入了盆跳蒙特卡罗(BHMC)采样方法，以降低计算复杂度并解决陷入局部极小值的问题。BHMC方法使我们的外观模型可以由足够数量的斑块组成。由于BHMC使用与外观建模中使用的相同的局部优化器，因此它可以有效地集成到我们的跟踪框架中。实验结果表明，该方法能够准确、鲁棒地跟踪几何形状剧烈变化的目标。

{"title":"Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive Basin Hopping Monte Carlo sampling","authors":"Junseok Kwon, Kyoung Mu Lee","doi":"10.1109/CVPR.2009.5206502","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206502","url":null,"abstract":"We propose a novel tracking algorithm for the target of which geometric appearance changes drastically over time. To track it, we present a local patch-based appearance model and provide an efficient scheme to evolve the topology between local patches by on-line update. In the process of on-line update, the robustness of each patch in the model is estimated by a new method of measurement which analyzes the landscape of local mode of the patch. This patch can be moved, deleted or newly added, which gives more flexibility to the model. Additionally, we introduce the Basin Hopping Monte Carlo (BHMC) sampling method to our tracking problem to reduce the computational complexity and deal with the problem of getting trapped in local minima. The BHMC method makes it possible for our appearance model to consist of enough numbers of patches. Since BHMC uses the same local optimizer that is used in the appearance modeling, it can be efficiently integrated into our tracking framework. Experimental results show that our approach tracks the object whose geometric appearance is drastically changing, accurately and robustly.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121513086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 241

Random walks on graphs to model saliency in images 在图形上随机行走以模拟图像的显著性

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206767

Viswanath Gopalakrishnan, Yiqun Hu, D. Rajan

We formulate the problem of salient region detection in images as Markov random walks performed on images represented as graphs. While the global properties of the image are extracted from the random walk on a complete graph, the local properties are extracted from a k-regular graph. The most salient node is selected as the one which is globally most isolated but falls on a compact object. The equilibrium hitting times of the ergodic Markov chain holds the key for identifying the most salient node. The background nodes which are farthest from the most salient node are also identified based on the hitting times calculated from the random walk. Finally, a seeded salient region identification mechanism is developed to identify the salient parts of the image. The robustness of the proposed algorithm is objectively demonstrated with experiments carried out on a large image database annotated with “ground-truth” salient regions.

我们将图像中的显著区域检测问题表述为对以图表示的图像执行的马尔可夫随机漫步。图像的全局属性是从完全图上的随机漫步中提取的，而局部属性是从k正则图中提取的。最显著的节点被选择为全局最孤立但落在紧凑对象上的节点。遍历马尔可夫链的平衡命中次数是识别最显著节点的关键。根据随机行走计算的命中次数，识别出距离最显著节点最远的背景节点。最后，提出了一种种子显著区域识别机制来识别图像的显著部分。通过在带有“ground-truth”显著区域标注的大型图像数据库上进行的实验，客观地证明了该算法的鲁棒性。

引用次数: 100

Recognising action as clouds of space-time interest points 将行动识别为时空兴趣点的云

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206779

Matteo Bregonzio, S. Gong, T. Xiang

Much of recent action recognition research is based on space-time interest points extracted from video using a Bag of Words (BOW) representation. It mainly relies on the discriminative power of individual local space-time descriptors, whilst ignoring potentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a novel action recognition approach which differs significantly from previous interest points based approaches in that only the global spatiotemporal distribution of the interest points are exploited. This is achieved through extracting holistic features from clouds of interest points accumulated over multiple temporal scales followed by automatic feature selection. Our approach avoids the non-trivial problems of selecting the optimal space-time descriptor, clustering algorithm for constructing a codebook, and selecting codebook size faced by previous interest points based methods. Our model is able to capture smooth motions, robust to view changes and occlusions at a low computation cost. Experiments using the KTH and WEIZMANN datasets demonstrate that our approach outperforms most existing methods.

最近的许多动作识别研究都是基于使用单词袋(BOW)表示从视频中提取时空兴趣点。它主要依赖于单个局部时空描述符的判别能力，而忽略了兴趣点的全球时空分布的潜在有价值的信息。在本文中，我们提出了一种新的动作识别方法，它与以往基于兴趣点的方法有很大的不同，因为它只利用了兴趣点的全局时空分布。这是通过从多个时间尺度上积累的兴趣点云中提取整体特征，然后进行自动特征选择来实现的。我们的方法避免了以往基于兴趣点的方法所面临的选择最优时空描述符、构建码本的聚类算法和选择码本大小等重要问题。我们的模型能够以较低的计算成本捕获平滑的运动，对视图变化和遮挡具有鲁棒性。使用KTH和WEIZMANN数据集的实验表明，我们的方法优于大多数现有方法。

{"title":"Recognising action as clouds of space-time interest points","authors":"Matteo Bregonzio, S. Gong, T. Xiang","doi":"10.1109/CVPR.2009.5206779","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206779","url":null,"abstract":"Much of recent action recognition research is based on space-time interest points extracted from video using a Bag of Words (BOW) representation. It mainly relies on the discriminative power of individual local space-time descriptors, whilst ignoring potentially valuable information about the global spatio-temporal distribution of interest points. In this paper, we propose a novel action recognition approach which differs significantly from previous interest points based approaches in that only the global spatiotemporal distribution of the interest points are exploited. This is achieved through extracting holistic features from clouds of interest points accumulated over multiple temporal scales followed by automatic feature selection. Our approach avoids the non-trivial problems of selecting the optimal space-time descriptor, clustering algorithm for constructing a codebook, and selecting codebook size faced by previous interest points based methods. Our model is able to capture smooth motions, robust to view changes and occlusions at a low computation cost. Experiments using the KTH and WEIZMANN datasets demonstrate that our approach outperforms most existing methods.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127597989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 415

Boosted multi-task learning for face verification with applications to web image and video search 通过网络图像和视频搜索增强了面部验证的多任务学习

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206736

Xiaogang Wang, Cha Zhang, Zhengyou Zhang

Face verification has many potential applications including filtering and ranking image/video search results on celebrities. Since these images/videos are taken under uncontrolled environments, the problem is very challenging due to dramatic lighting and pose variations, low resolutions, compression artifacts, etc. In addition, the available number of training images for each celebrity may be limited, hence learning individual classifiers for each person may cause overfitting. In this paper, we propose two ideas to meet the above challenges. First, we propose to use individual bins, instead of whole histograms, of Local Binary Patterns (LBP) as features for learning, which yields significant performance improvements and computation reduction in our experiments. Second, we present a novel Multi-Task Learning (MTL) framework, called Boosted MTL, for face verification with limited training data. It jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The effectiveness of Boosted MTL and LBP bin features is verified with a large number of celebrity images/videos from the web.

人脸验证有许多潜在的应用，包括过滤和对名人的图像/视频搜索结果进行排名。由于这些图像/视频是在不受控制的环境下拍摄的，由于戏剧性的照明和姿势变化，低分辨率，压缩伪影等，问题非常具有挑战性。此外，每个名人的可用训练图像数量可能有限，因此为每个人学习单个分类器可能会导致过拟合。在本文中，我们提出了两个想法来应对上述挑战。首先，我们建议使用局部二值模式(LBP)的单个箱子，而不是整个直方图作为学习特征，这在我们的实验中产生了显着的性能改进和计算减少。其次，我们提出了一种新的多任务学习(MTL)框架，称为增强MTL，用于有限训练数据的人脸验证。它通过共享几个增强分类器来共同学习多人的分类器，以避免过拟合。通过大量来自网络的名人图像/视频验证了boosting MTL和LBP bin功能的有效性。

{"title":"Boosted multi-task learning for face verification with applications to web image and video search","authors":"Xiaogang Wang, Cha Zhang, Zhengyou Zhang","doi":"10.1109/CVPR.2009.5206736","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206736","url":null,"abstract":"Face verification has many potential applications including filtering and ranking image/video search results on celebrities. Since these images/videos are taken under uncontrolled environments, the problem is very challenging due to dramatic lighting and pose variations, low resolutions, compression artifacts, etc. In addition, the available number of training images for each celebrity may be limited, hence learning individual classifiers for each person may cause overfitting. In this paper, we propose two ideas to meet the above challenges. First, we propose to use individual bins, instead of whole histograms, of Local Binary Patterns (LBP) as features for learning, which yields significant performance improvements and computation reduction in our experiments. Second, we present a novel Multi-Task Learning (MTL) framework, called Boosted MTL, for face verification with limited training data. It jointly learns classifiers for multiple people by sharing a few boosting classifiers in order to avoid overfitting. The effectiveness of Boosted MTL and LBP bin features is verified with a large number of celebrity images/videos from the web.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 125

Distributed volumetric scene geometry reconstruction with a network of distributed smart cameras 基于分布式智能摄像头网络的分布式体景几何重建

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206589

Shubao Liu, Kongbin Kang, Jean-Philippe Tarel, D. Cooper

Central to many problems in scene understanding based on using a network of tens, hundreds or even thousands of randomly distributed cameras with on-board processing and wireless communication capability is the “efficient” reconstruction of the 3D geometry structure in the scene. What is meant by “efficient” reconstruction? In this paper we investigate this from different aspects in the context of visual sensor networks and offer a distributed reconstruction algorithm roughly meeting the following goals: 1. Close to achievable 3D reconstruction accuracy and robustness; 2. Minimization of the processing time by adaptive computing-job distribution among all the cameras in the network and asynchronous parallel processing; 3. Communication Optimization and minimization of the (battery-stored) energy, by reducing and localizing the communications between cameras. A volumetric representation of the scene is reconstructed with a shape from apparent contour algorithm, which is suitable for distributed processing because it is essentially a local operation in terms of the involved cameras, and apparent contours are robust to ourdoor illumination conditions. Each camera processes its own image and performs the computation for a small subset of voxels, and updates the voxels through collaborating with its neighbor cameras. By exploring the structure of the reconstruction algorithm, we design the minimum-spanning-tree (MST) message passing protocol in order to minimize the communication. Of interest is that the resulting system is an example of “swarm behavior”. 3D reconstruction is illustrated using two real image sets, running on a single computer. The iterative computations used in the single processor experiment are exactly the same as are those used in the network computations. Distributed concepts and algorithms for network control and communication performance are theoretical designs and estimates.

基于使用数十、数百甚至数千个随机分布的相机组成的网络，具有机载处理和无线通信能力，在场景理解中，许多问题的核心是“有效”重建场景中的3D几何结构。什么是“高效”重建?本文在视觉传感器网络的背景下，从不同的角度进行了研究，并提出了一种大致满足以下目标的分布式重建算法:接近可实现的三维重建精度和鲁棒性;2. 通过网络中所有摄像机的自适应计算任务分配和异步并行处理来最小化处理时间;3.通过减少和定位相机之间的通信，优化和最小化(电池存储的)能量。用视轮廓算法重建场景的体积表示，该算法适用于分布式处理，因为它本质上是涉及相机的局部操作，并且视轮廓对室外照明条件具有鲁棒性。每个相机处理自己的图像并执行一小部分体素的计算，并通过与相邻相机协作更新体素。通过对重构算法结构的探索，设计了最小生成树(MST)消息传递协议，使通信最小化。有趣的是，最终的系统是“群体行为”的一个例子。三维重建是用两个真实的图像集，在一台计算机上运行说明。单处理器实验中使用的迭代计算与网络计算中使用的迭代计算完全相同。网络控制和通信性能的分布式概念和算法是理论设计和估计。

{"title":"Distributed volumetric scene geometry reconstruction with a network of distributed smart cameras","authors":"Shubao Liu, Kongbin Kang, Jean-Philippe Tarel, D. Cooper","doi":"10.1109/CVPR.2009.5206589","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206589","url":null,"abstract":"Central to many problems in scene understanding based on using a network of tens, hundreds or even thousands of randomly distributed cameras with on-board processing and wireless communication capability is the “efficient” reconstruction of the 3D geometry structure in the scene. What is meant by “efficient” reconstruction? In this paper we investigate this from different aspects in the context of visual sensor networks and offer a distributed reconstruction algorithm roughly meeting the following goals: 1. Close to achievable 3D reconstruction accuracy and robustness; 2. Minimization of the processing time by adaptive computing-job distribution among all the cameras in the network and asynchronous parallel processing; 3. Communication Optimization and minimization of the (battery-stored) energy, by reducing and localizing the communications between cameras. A volumetric representation of the scene is reconstructed with a shape from apparent contour algorithm, which is suitable for distributed processing because it is essentially a local operation in terms of the involved cameras, and apparent contours are robust to ourdoor illumination conditions. Each camera processes its own image and performs the computation for a small subset of voxels, and updates the voxels through collaborating with its neighbor cameras. By exploring the structure of the reconstruction algorithm, we design the minimum-spanning-tree (MST) message passing protocol in order to minimize the communication. Of interest is that the resulting system is an example of “swarm behavior”. 3D reconstruction is illustrated using two real image sets, running on a single computer. The iterative computations used in the single processor experiment are exactly the same as are those used in the network computations. Distributed concepts and algorithms for network control and communication performance are theoretical designs and estimates.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127455516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Multiphase geometric couplings for the segmentation of neural processes 神经过程分割的多相几何耦合

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206524

Amelio Vázquez Reina, E. Miller, H. Pfister

The ability to constrain the geometry of deformable models for image segmentation can be useful when information about the expected shape or positioning of the objects in a scene is known a priori. An example of this occurs when segmenting neural cross sections in electron microscopy. Such images often contain multiple nested boundaries separating regions of homogeneous intensities. For these applications, multiphase level sets provide a partitioning framework that allows for the segmentation of multiple deformable objects by combining several level set functions. Although there has been much effort in the study of statistical shape priors that can be used to constrain the geometry of each partition, none of these methods allow for the direct modeling of geometric arrangements of partitions. In this paper, we show how to define elastic couplings between multiple level set functions to model ribbon-like partitions. We build such couplings using dynamic force fields that can depend on the image content and relative location and shape of the level set functions. To the best of our knowledge, this is the first work that shows a direct way of geometrically constraining multiphase level sets for image segmentation. We demonstrate the robustness of our method by comparing it with previous level set segmentation methods.

约束可变形模型的几何形状以进行图像分割的能力在先验地知道场景中物体的预期形状或定位信息时是有用的。在电子显微镜中分割神经横截面就是一个例子。这样的图像通常包含多个嵌套的边界来分隔均匀强度的区域。对于这些应用，多相水平集提供了一个分区框架，允许通过组合几个水平集函数来分割多个可变形对象。尽管在统计形状先验的研究中已经付出了很多努力，这些先验可以用来约束每个分区的几何形状，但这些方法都不允许对分区的几何排列进行直接建模。在本文中，我们展示了如何定义多个水平集函数之间的弹性耦合来模拟带状分区。我们使用动态力场来构建这样的耦合，动态力场可以依赖于图像内容和水平集函数的相对位置和形状。据我们所知，这是第一个展示了用于图像分割的几何约束多相水平集的直接方法的工作。通过与以前的水平集分割方法进行比较，我们证明了该方法的鲁棒性。

{"title":"Multiphase geometric couplings for the segmentation of neural processes","authors":"Amelio Vázquez Reina, E. Miller, H. Pfister","doi":"10.1109/CVPR.2009.5206524","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206524","url":null,"abstract":"The ability to constrain the geometry of deformable models for image segmentation can be useful when information about the expected shape or positioning of the objects in a scene is known a priori. An example of this occurs when segmenting neural cross sections in electron microscopy. Such images often contain multiple nested boundaries separating regions of homogeneous intensities. For these applications, multiphase level sets provide a partitioning framework that allows for the segmentation of multiple deformable objects by combining several level set functions. Although there has been much effort in the study of statistical shape priors that can be used to constrain the geometry of each partition, none of these methods allow for the direct modeling of geometric arrangements of partitions. In this paper, we show how to define elastic couplings between multiple level set functions to model ribbon-like partitions. We build such couplings using dynamic force fields that can depend on the image content and relative location and shape of the level set functions. To the best of our knowledge, this is the first work that shows a direct way of geometrically constraining multiphase level sets for image segmentation. We demonstrate the robustness of our method by comparing it with previous level set segmentation methods.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Learning to detect unseen object classes by between-class attribute transfer 学习通过类间属性转移来检测不可见的对象类

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206594

Christoph H. Lampert, H. Nickisch, S. Harmeling

We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

研究了训练类和测试类不相交，即没有目标类的训练样例时的目标分类问题。这种设置在计算机视觉研究中几乎没有被研究过，但这是一种规则而不是例外，因为世界上包含成千上万不同的对象类，只有极少数的图像，已经形成了集合，并用合适的类标签进行了注释。在本文中，我们通过引入基于属性的分类来解决这个问题。它基于人类指定的目标对象的高级描述而不是训练图像来执行目标检测。描述由任意的语义属性组成，如形状、颜色甚至地理信息。由于这些属性超越了手头的特定学习任务，因此可以预先学习它们，例如，从与当前任务无关的图像数据集中。然后，可以根据它们的属性表示来检测新的类，而不需要新的训练阶段。为了评估我们的方法并促进这一领域的研究，我们组装了一个新的大规模数据集，“具有属性的动物”，其中超过30,000个动物图像与Osherson的经典表中的50个类别相匹配，该表描述了人类如何将85个语义属性与动物类别强烈关联。我们的实验表明，通过使用属性层，确实可以构建一个不需要目标类的任何训练图像的学习对象检测系统。

{"title":"Learning to detect unseen object classes by between-class attribute transfer","authors":"Christoph H. Lampert, H. Nickisch, S. Harmeling","doi":"10.1109/CVPR.2009.5206594","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206594","url":null,"abstract":"We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132650627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2283

Blind motion deblurring from a single image using sparse approximation 利用稀疏逼近对单幅图像进行盲运动去模糊

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206743

Jian-Feng Cai, Hui Ji, Chaoqiang Liu, Zuowei Shen

Restoring a clear image from a single motion-blurred image due to camera shake has long been a challenging problem in digital imaging. Existing blind deblurring techniques either only remove simple motion blurring, or need user interactions to work on more complex cases. In this paper, we present an approach to remove motion blurring from a single image by formulating the blind blurring as a new joint optimization problem, which simultaneously maximizes the sparsity of the blur kernel and the sparsity of the clear image under certain suitable redundant tight frame systems (curvelet system for kernels and framelet system for images). Without requiring any prior information of the blur kernel as the input, our proposed approach is able to recover high-quality images from given blurred images. Furthermore, the new sparsity constraints under tight frame systems enable the application of a fast algorithm called linearized Bregman iteration to efficiently solve the proposed minimization problem. The experiments on both simulated images and real images showed that our algorithm can effectively removing complex motion blurring from nature images.

从由相机抖动引起的单一运动模糊图像中恢复清晰图像一直是数字成像领域的难题。现有的盲去模糊技术要么只能去除简单的运动模糊，要么需要用户交互才能处理更复杂的情况。本文提出了一种消除单幅图像运动模糊的方法，该方法将盲模糊作为一种新的联合优化问题，在适当的冗余紧帧系统(对核的曲线系统和对图像的框架系统)下，使模糊核的稀疏性和清晰图像的稀疏性同时最大化。在不需要任何模糊核的先验信息作为输入的情况下，我们提出的方法能够从给定的模糊图像中恢复高质量的图像。此外，在紧框架系统下，新的稀疏性约束使得线性化布雷格曼迭代的快速算法能够有效地解决所提出的最小化问题。在模拟图像和真实图像上的实验表明，该算法可以有效地去除自然图像中的复杂运动模糊。

{"title":"Blind motion deblurring from a single image using sparse approximation","authors":"Jian-Feng Cai, Hui Ji, Chaoqiang Liu, Zuowei Shen","doi":"10.1109/CVPR.2009.5206743","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206743","url":null,"abstract":"Restoring a clear image from a single motion-blurred image due to camera shake has long been a challenging problem in digital imaging. Existing blind deblurring techniques either only remove simple motion blurring, or need user interactions to work on more complex cases. In this paper, we present an approach to remove motion blurring from a single image by formulating the blind blurring as a new joint optimization problem, which simultaneously maximizes the sparsity of the blur kernel and the sparsity of the clear image under certain suitable redundant tight frame systems (curvelet system for kernels and framelet system for images). Without requiring any prior information of the blur kernel as the input, our proposed approach is able to recover high-quality images from given blurred images. Furthermore, the new sparsity constraints under tight frame systems enable the application of a fast algorithm called linearized Bregman iteration to efficiently solve the proposed minimization problem. The experiments on both simulated images and real images showed that our algorithm can effectively removing complex motion blurring from nature images.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132291167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 317

The geometry of 2D image signals 二维图像信号的几何特性

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206784

Lennart Wietzke, G. Sommer, O. Fleischmann

This paper covers a fundamental problem of local phase based signal processing: the isotropic generalization of the classical 1D analytic signal to two dimensions. The well known analytic signal enables the analysis of local phase and amplitude information of 1D signals. Local phase, amplitude and additional orientation information can be extracted by the 2D monogenic signal with the restriction to the subclass of intrinsically one dimensional signals. In case of 2D image signals the monogenic signal enables the rotationally invariant analysis of lines and edges. In this work we present the 2D analytic signal as a novel generalization of both the analytic signal and the 2D monogenic signal. In case of 2D image signals the 2D analytic signal enables the isotropic analysis of lines, edges, corners and junctions in one unified framework. Furthermore, we show that 2D signals exist per se in a 3D projective subspace of the homogeneous conformal space which delivers a descriptive geometric interpretation of signals providing new insights on the relation of geometry and 2D signals.

本文讨论了局部相位信号处理的一个基本问题:经典一维解析信号向二维的各向同性推广。众所周知的解析信号可以分析一维信号的局部相位和幅度信息。二维单基因信号在本质一维信号子类的限制下，可以提取局部相位、幅度和附加的方位信息。在二维图像信号的情况下，单基因信号能够对线和边缘进行旋转不变性分析。在这项工作中，我们提出了二维解析信号作为解析信号和二维单基因信号的一种新的推广。对于二维图像信号，二维分析信号可以在一个统一的框架内对线、边、角和结点进行各向同性分析。此外，我们证明了二维信号本身存在于齐次共形空间的三维射影子空间中，这提供了信号的描述性几何解释，为几何和二维信号的关系提供了新的见解。

引用次数: 28

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀