2009 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文中文

A robust parametric method for bias field estimation and segmentation of MR images 一种鲁棒的磁共振图像偏置场估计和分割方法

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206553

Chunming Li, Chris Gatenby, Li Wang, J. Gore

This paper proposes a new energy minimization framework for simultaneous estimation of the bias field and segmentation of tissues for magnetic resonance images. The bias field is modeled as a linear combination of a set of basis functions, and thereby parameterized by the coefficients of the basis functions. We define an energy that depends on the coefficients of the basis functions, the membership functions of the tissues in the image, and the constants approximating the true signal from the corresponding tissues. This energy is convex in each of its variables. Bias field estimation and image segmentation are simultaneously achieved as the result of minimizing this energy. We provide an efficient iterative algorithm for energy minimization, which converges to the optimal solution at a fast rate. A salient advantage of our method is that its result is independent of initialization, which allows robust and fully automated application. The proposed method has been successfully applied to 3-Tesla MR images with desirable results. Comparisons with other approaches demonstrate the superior performance of this algorithm.

提出了一种新的能量最小化框架，用于同时估计磁共振图像的偏置场和组织分割。将偏置场建模为一组基函数的线性组合，从而用基函数的系数来参数化偏置场。我们定义了一个能量，它取决于基函数的系数，图像中组织的隶属函数，以及接近相应组织的真实信号的常数。这个能量在它的每个变量中都是凸的。由于该能量最小，因此可以同时实现偏置场估计和图像分割。给出了一种高效的能量最小化迭代算法，该算法快速收敛到最优解。我们的方法的一个显著优点是它的结果是独立于初始化的，这允许健壮和完全自动化的应用程序。该方法已成功应用于3-特斯拉MR图像，效果良好。与其他方法的比较表明，该算法具有较好的性能。

{"title":"A robust parametric method for bias field estimation and segmentation of MR images","authors":"Chunming Li, Chris Gatenby, Li Wang, J. Gore","doi":"10.1109/CVPR.2009.5206553","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206553","url":null,"abstract":"This paper proposes a new energy minimization framework for simultaneous estimation of the bias field and segmentation of tissues for magnetic resonance images. The bias field is modeled as a linear combination of a set of basis functions, and thereby parameterized by the coefficients of the basis functions. We define an energy that depends on the coefficients of the basis functions, the membership functions of the tissues in the image, and the constants approximating the true signal from the corresponding tissues. This energy is convex in each of its variables. Bias field estimation and image segmentation are simultaneously achieved as the result of minimizing this energy. We provide an efficient iterative algorithm for energy minimization, which converges to the optimal solution at a fast rate. A salient advantage of our method is that its result is independent of initialization, which allows robust and fully automated application. The proposed method has been successfully applied to 3-Tesla MR images with desirable results. Comparisons with other approaches demonstrate the superior performance of this algorithm.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Automatic fetal face detection from ultrasound volumes via learning 3D and 2D information 通过学习3D和2D信息，从超声体积中自动检测胎儿面部

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206527

Shaolei Feng, S. Zhou, Sara Good, D. Comaniciu

3D ultrasound imaging has been increasingly used in clinics for fetal examination. However, manually searching for the optimal view of the fetal face in 3D ultrasound volumes is cumbersome and time-consuming even for expert physicians and sonographers. In this paper we propose a learning-based approach which combines both 3D and 2D information for automatic and fast fetal face detection from 3D ultrasound volumes. Our approach applies a new technique - constrained marginal space learning - for 3D face mesh detection, and combines a boosting-based 2D profile detection to refine 3D face pose. To enhance the rendering of the fetal face, an automatic carving algorithm is proposed to remove all obstructions in front of the face based on the detected face mesh. Experiments are performed on a challenging 3D ultrasound data set containing 1010 fetal volumes. The results show that our system not only achieves excellent detection accuracy but also runs very fast - it can detect the fetal face from the 3D data in 1 second on a dual-core 2.0 GHz computer.

三维超声成像已越来越多地用于临床胎儿检查。然而，即使对专家医生和超声医师来说，在3D超声体积中手动搜索胎儿面部的最佳视图既麻烦又耗时。在本文中，我们提出了一种基于学习的方法，该方法结合了3D和2D信息，用于从3D超声体积中自动快速检测胎儿面部。我们的方法应用了一种新技术-约束边缘空间学习-用于3D人脸网格检测，并结合了基于增强的2D轮廓检测来优化3D人脸姿态。为了增强胎儿面部的渲染效果，提出了一种基于检测到的面部网格，去除面部前方所有障碍物的自动雕刻算法。实验是在包含1010个胎儿体积的具有挑战性的3D超声数据集上进行的。结果表明，该系统不仅具有良好的检测精度，而且运行速度非常快，在双核2.0 GHz计算机上，可以在1秒内从3D数据中检测出胎儿面部。

引用次数: 37

Learning semantic scene models by object classification and trajectory clustering 通过对象分类和轨迹聚类学习语义场景模型

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206809

Tianzhu Zhang, Hanqing Lu, S. Li

Activity analysis is a basic task in video surveillance and has become an active research area. However, due to the diversity of moving objects category and their motion patterns, developing robust semantic scene models for activity analysis remains a challenging problem in traffic scenarios. This paper proposes a novel framework to learn semantic scene models. In this framework, the detected moving objects are first classified as pedestrians or vehicles via a co-trained classifier which takes advantage of the multiview information of objects. As a result, the framework can automatically learn motion patterns respectively for pedestrians and vehicles. Then, a graph is proposed to learn and cluster the motion patterns. To this end, trajectory is parameterized and the image is cut into multiple blocks which are taken as the nodes in the graph. Based on the parameters of trajectories, the primary motion patterns in each node (block) are extracted via Gaussian mixture model (GMM), and supplied to this graph. The graph cut algorithm is finally employed to group the motion patterns together, and trajectories are clustered to learn semantic scene models. Experimental results and applications to real world scenes show the validity of our proposed method.

活动分析是视频监控的一项基本任务，已成为一个活跃的研究领域。然而，由于运动物体类别及其运动模式的多样性，开发鲁棒的语义场景模型用于交通场景的活动分析仍然是一个具有挑战性的问题。本文提出了一种新的语义场景模型学习框架。在该框架中，首先利用物体的多视图信息，通过共同训练的分类器将检测到的运动物体分类为行人或车辆。因此，该框架可以自动学习行人和车辆的运动模式。然后，提出了一个图来学习和聚类运动模式。为此，将轨迹参数化，并将图像切割成多个块作为图中的节点。基于轨迹参数，通过高斯混合模型(GMM)提取每个节点(块)的主要运动模式，并提供给该图。最后利用图切算法对运动模式进行分组，并对轨迹进行聚类学习语义场景模型。实验结果和对真实场景的应用表明了该方法的有效性。

{"title":"Learning semantic scene models by object classification and trajectory clustering","authors":"Tianzhu Zhang, Hanqing Lu, S. Li","doi":"10.1109/CVPR.2009.5206809","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206809","url":null,"abstract":"Activity analysis is a basic task in video surveillance and has become an active research area. However, due to the diversity of moving objects category and their motion patterns, developing robust semantic scene models for activity analysis remains a challenging problem in traffic scenarios. This paper proposes a novel framework to learn semantic scene models. In this framework, the detected moving objects are first classified as pedestrians or vehicles via a co-trained classifier which takes advantage of the multiview information of objects. As a result, the framework can automatically learn motion patterns respectively for pedestrians and vehicles. Then, a graph is proposed to learn and cluster the motion patterns. To this end, trajectory is parameterized and the image is cut into multiple blocks which are taken as the nodes in the graph. Based on the parameters of trajectories, the primary motion patterns in each node (block) are extracted via Gaussian mixture model (GMM), and supplied to this graph. The graph cut algorithm is finally employed to group the motion patterns together, and trajectories are clustered to learn semantic scene models. Experimental results and applications to real world scenes show the validity of our proposed method.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129134467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 117

Mutual information-based stereo matching combined with SIFT descriptor in log-chromaticity color space 对数色度空间中基于互信息的立体匹配与SIFT描述子相结合

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206507

Y. S. Heo, Kyoung Mu Lee, Sang Uk Lee

Radiometric variations between input images can seriously degrade the performance of stereo matching algorithms. In this situation, mutual information is a very popular and powerful measure which can find any global relationship of intensities between two input images taken from unknown sources. The mutual information-based method, however, is still ambiguous or erroneous as regards local radiometric variations, since it only accounts for global variation between images, and does not contain spatial information properly. In this paper, we present a new method based on mutual information combined with SIFT descriptor to find correspondence for images which undergo local as well as global radiometric variations. We transform the input color images to log-chromaticity color space from which a linear relationship can be established. To incorporate spatial information in mutual information, we utilize the SIFT descriptor which includes near pixel gradient histogram to construct a joint probability in log-chromaticity color space. By combining the mutual information as an appearance measure and the SIFT descriptor as a geometric measure, we devise a robust and accurate stereo system. Experimental results show that our method is superior to the state-of-the art algorithms including conventional mutual information-based methods and window correlation methods under various radiometric changes.

输入图像之间的辐射变化会严重降低立体匹配算法的性能。在这种情况下，互信息是一种非常流行和强大的度量，它可以找到来自未知来源的两个输入图像之间的任何全局强度关系。然而，基于互信息的方法在局部辐射变化方面仍然是模糊或错误的，因为它只考虑图像之间的全局变化，而没有适当地包含空间信息。本文提出了一种基于互信息与SIFT描述子相结合的方法，用于寻找局部和全局辐射变化图像的对应关系。我们将输入的彩色图像转换为对数色度色彩空间，从对数色度色彩空间可以建立线性关系。为了将空间信息融合到互信息中，我们利用包含近像素梯度直方图的SIFT描述子在对数色度色彩空间中构造联合概率。通过结合互信息作为外观度量和SIFT描述子作为几何度量，我们设计了一个鲁棒和精确的立体系统。实验结果表明，在各种辐射变化情况下，该方法优于传统的互信息方法和窗口相关方法。

{"title":"Mutual information-based stereo matching combined with SIFT descriptor in log-chromaticity color space","authors":"Y. S. Heo, Kyoung Mu Lee, Sang Uk Lee","doi":"10.1109/CVPR.2009.5206507","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206507","url":null,"abstract":"Radiometric variations between input images can seriously degrade the performance of stereo matching algorithms. In this situation, mutual information is a very popular and powerful measure which can find any global relationship of intensities between two input images taken from unknown sources. The mutual information-based method, however, is still ambiguous or erroneous as regards local radiometric variations, since it only accounts for global variation between images, and does not contain spatial information properly. In this paper, we present a new method based on mutual information combined with SIFT descriptor to find correspondence for images which undergo local as well as global radiometric variations. We transform the input color images to log-chromaticity color space from which a linear relationship can be established. To incorporate spatial information in mutual information, we utilize the SIFT descriptor which includes near pixel gradient histogram to construct a joint probability in log-chromaticity color space. By combining the mutual information as an appearance measure and the SIFT descriptor as a geometric measure, we devise a robust and accurate stereo system. Experimental results show that our method is superior to the state-of-the art algorithms including conventional mutual information-based methods and window correlation methods under various radiometric changes.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132861938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Adaptive image and video retargeting technique based on Fourier analysis 基于傅里叶分析的自适应图像和视频重定向技术

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206666

Jun-Seong Kim, Jin-Hwan Kim, Chang-Su Kim

An adaptive image and video retargeting algorithm based on Fourier analysis is proposed in this work. We first divide an input image into several strips using the gradient information so that each strip consists of textures of similar complexities. Then, we scale each strip adaptively according to its importance measure. More specifically, the distortions, generated by the scaling procedure, are formulated in the frequency domain using the Fourier transform. Then, the objective is to determine the sizes of scaled strips to minimize the sum of distortions, subject to the constraint that the sum of their sizes should equal the size of the target output image. We solve this constrained optimization problem using the Lagrangian multiplier technique. Moreover, we extend the approach to the retargeting of video sequences. Simulation results demonstrate that the proposed algorithm provides reliable retargeting performance efficiently.

本文提出了一种基于傅里叶分析的自适应图像和视频重定向算法。我们首先使用梯度信息将输入图像分成若干条，使每个条由相似复杂性的纹理组成。然后，我们根据其重要性度量自适应缩放每个条带。更具体地说，由缩放过程产生的畸变在频域中使用傅里叶变换表示。然后，目标是确定缩放条带的大小，以最小化扭曲的总和，并受到其大小之和应等于目标输出图像大小的约束。我们用拉格朗日乘子技术解决了这个约束优化问题。此外，我们将该方法扩展到视频序列的重定向。仿真结果表明，该算法具有可靠的重定向性能。

引用次数: 77

Trajectory reconstruction for affine structure-from-motion by global and local constraints 基于全局和局部约束的仿射结构运动轨迹重建

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206664

H. Ackermann, B. Rosenhahn

The problem of reconstructing a 3D scene from a moving camera can be solved by means of the so-called Factorization method. It directly computes a global solution without the need to merge several partial reconstructions. However, if the trajectories are not complete, i.e. not every feature point could be observed in all the images, this method cannot be used. We use a Factorization-style algorithm for recovering the unobserved feature positions in a non-incremental way. This method uniformly utilizes all data and finds a global solution without any need of sequential or hierarchical merging. Two contributions are made in this work: Firstly, partially known trajectories are completed by minimizing the distance between the subspace and the trajectory within an affine subspace associated with the trajectory. This amounts to imposing a global constraint on the data. Secondly, we propose to further include local constraints derived from epipolar geometry into the estimation. It is shown how to simultaneously optimize both constraints. By using simulated and real image sequences we show the improvements achieved with our algorithm.

通过所谓的分解方法，可以解决从移动摄像机中重建三维场景的问题。它直接计算一个全局解决方案，而不需要合并几个部分重建。但是，如果轨迹不完整，即不能在所有图像中观察到每个特征点，则不能使用该方法。我们使用分解式算法以非增量的方式恢复未观察到的特征位置。该方法统一利用所有数据，找到全局解，不需要顺序合并或分层合并。本研究有两个贡献:首先，通过最小化与轨迹相关的仿射子空间内的子空间与轨迹之间的距离来完成部分已知轨迹。这相当于对数据施加了一个全局约束。其次，我们建议在估计中进一步加入由极极几何导出的局部约束。给出了如何同时优化这两个约束。通过模拟和真实图像序列，我们展示了算法的改进效果。

{"title":"Trajectory reconstruction for affine structure-from-motion by global and local constraints","authors":"H. Ackermann, B. Rosenhahn","doi":"10.1109/CVPR.2009.5206664","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206664","url":null,"abstract":"The problem of reconstructing a 3D scene from a moving camera can be solved by means of the so-called Factorization method. It directly computes a global solution without the need to merge several partial reconstructions. However, if the trajectories are not complete, i.e. not every feature point could be observed in all the images, this method cannot be used. We use a Factorization-style algorithm for recovering the unobserved feature positions in a non-incremental way. This method uniformly utilizes all data and finds a global solution without any need of sequential or hierarchical merging. Two contributions are made in this work: Firstly, partially known trajectories are completed by minimizing the distance between the subspace and the trajectory within an affine subspace associated with the trajectory. This amounts to imposing a global constraint on the data. Secondly, we propose to further include local constraints derived from epipolar geometry into the estimation. It is shown how to simultaneously optimize both constraints. By using simulated and real image sequences we show the improvements achieved with our algorithm.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126403555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Multiple view image denoising 多视图图像去噪

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206836

Li Zhang, Sundeep Vaddadi, Hailin Jin, S. Nayar

We present a novel multi-view denoising algorithm. Our algorithm takes noisy images taken from different viewpoints as input and groups similar patches in the input images using depth estimation. We model intensity-dependent noise in low-light conditions and use the principal component analysis and tensor analysis to remove such noise. The dimensionalities for both PCA and tensor analysis are automatically computed in a way that is adaptive to the complexity of image structures in the patches. Our method is based on a probabilistic formulation that marginalizes depth maps as hidden variables and therefore does not require perfect depth estimation. We validate our algorithm on both synthetic and real images with different content. Our algorithm compares favorably against several state-of-the-art denoising algorithms.

提出了一种新的多视图去噪算法。该算法将不同视点的噪声图像作为输入，利用深度估计对输入图像中的相似块进行分组。我们对弱光条件下的强度相关噪声进行建模，并使用主成分分析和张量分析来去除此类噪声。主成分分析和张量分析的维数都是自动计算的，以适应补丁中图像结构的复杂性。我们的方法是基于一个概率公式，边缘深度图作为隐藏变量，因此不需要完美的深度估计。我们在不同内容的合成图像和真实图像上验证了我们的算法。我们的算法与几种最先进的去噪算法相比具有优势。

引用次数: 87

Actions in context 上下文中的动作

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206557

Marcin Marszalek, I. Laptev, C. Schmid

This paper exploits the context of natural dynamic scenes for human action recognition in video. Human actions are frequently constrained by the purpose and the physical properties of scenes and demonstrate high correlation with particular scene classes. For example, eating often happens in a kitchen while running is more common outdoors. The contribution of this paper is three-fold: (a) we automatically discover relevant scene classes and their correlation with human actions, (b) we show how to learn selected scene classes from video without manual supervision and (c) we develop a joint framework for action and scene recognition and demonstrate improved recognition of both in natural video. We use movie scripts as a means of automatic supervision for training. For selected action classes we identify correlated scene classes in text and then retrieve video samples of actions and scenes for training using script-to-video alignment. Our visual models for scenes and actions are formulated within the bag-of-features framework and are combined in a joint scene-action SVM-based classifier. We report experimental results and validate the method on a new large dataset with twelve action classes and ten scene classes acquired from 69 movies.

本文利用自然动态场景背景进行视频中的人体动作识别。人类的行为经常受到场景的目的和物理特性的约束，并与特定的场景类别表现出高度的相关性。例如，吃饭通常发生在厨房，而跑步更常见的是在户外。本文的贡献有三个方面:(a)我们自动发现了相关的场景类及其与人类行为的相关性，(b)我们展示了如何在没有人工监督的情况下从视频中学习选定的场景类，(c)我们开发了一个动作和场景识别的联合框架，并展示了在自然视频中对两者的改进识别。我们使用电影剧本作为自动监督训练的手段。对于选定的动作类，我们在文本中识别相关的场景类，然后检索动作和场景的视频样本，使用脚本到视频对齐进行训练。我们的场景和动作的视觉模型是在特征袋框架内制定的，并结合在一个基于svm的联合场景-动作分类器中。我们报告了实验结果，并在一个新的大型数据集上验证了该方法，该数据集包含来自69部电影的12个动作类和10个场景类。

{"title":"Actions in context","authors":"Marcin Marszalek, I. Laptev, C. Schmid","doi":"10.1109/CVPR.2009.5206557","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206557","url":null,"abstract":"This paper exploits the context of natural dynamic scenes for human action recognition in video. Human actions are frequently constrained by the purpose and the physical properties of scenes and demonstrate high correlation with particular scene classes. For example, eating often happens in a kitchen while running is more common outdoors. The contribution of this paper is three-fold: (a) we automatically discover relevant scene classes and their correlation with human actions, (b) we show how to learn selected scene classes from video without manual supervision and (c) we develop a joint framework for action and scene recognition and demonstrate improved recognition of both in natural video. We use movie scripts as a means of automatic supervision for training. For selected action classes we identify correlated scene classes in text and then retrieve video samples of actions and scenes for training using script-to-video alignment. Our visual models for scenes and actions are formulated within the bag-of-features framework and are combined in a joint scene-action SVM-based classifier. We report experimental results and validate the method on a new large dataset with twelve action classes and ten scene classes acquired from 69 movies.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121072092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1350

Learning invariant features through topographic filter maps 通过地形滤波图学习不变特征

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206545

K. Kavukcuoglu, Marc'Aurelio Ranzato, R. Fergus, Yann LeCun

Several recently-proposed architectures for high-performance object recognition are composed of two main stages: a feature extraction stage that extracts locally-invariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.

最近提出的几种高性能目标识别体系结构由两个主要阶段组成:从规则间隔的图像补丁中提取局部不变特征向量的特征提取阶段，以及某种程度上的通用监督分类器。第一阶段通常由三个主要模块组成:(1)一组滤波器(通常是定向边缘检测器);(2)非线性变换，如逐点压缩函数、量化或归一化;(3)空间池化操作，将相邻区域上相似滤波器的输出组合在一起。我们提出了一种方法，通过同时学习过滤器和将多个过滤器输出组合在一起的池化单元，以无监督的方式自动学习这些特征提取器。该方法通过提取方向、比例尺和位置特征的相似滤波器自动生成地形图。这些相似的过滤器汇集在一起，产生局部不变的输出。学习到的特征描述符在非常适合SIFT的图像识别任务上给出与SIFT相当的结果，并且在不太适合SIFT的任务上给出比SIFT更好的结果。

{"title":"Learning invariant features through topographic filter maps","authors":"K. Kavukcuoglu, Marc'Aurelio Ranzato, R. Fergus, Yann LeCun","doi":"10.1109/CVPR.2009.5206545","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206545","url":null,"abstract":"Several recently-proposed architectures for high-performance object recognition are composed of two main stages: a feature extraction stage that extracts locally-invariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121089634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 351

Locally time-invariant models of human activities using trajectories on the grassmannian 利用格拉斯曼年轨迹的人类活动局部时不变模型

2009 IEEE Conference on Computer Vision and Pattern Recognition

Pub Date : 2009-06-20 DOI: 10.1109/CVPR.2009.5206710

P. Turaga, R. Chellappa

Human activity analysis is an important problem in computer vision with applications in surveillance and summarization and indexing of consumer content. Complex human activities are characterized by non-linear dynamics that make learning, inference and recognition hard. In this paper, we consider the problem of modeling and recognizing complex activities which exhibit time-varying dynamics. To this end, we describe activities as outputs of linear dynamic systems (LDS) whose parameters vary with time, or a time-varying linear dynamic system (TV-LDS). We discuss parameter estimation methods for this class of models by assuming that the parameters are locally time-invariant. Then, we represent the space of LDS models as a Grassmann manifold. Then, the TV-LDS model is defined as a trajectory on the Grassmann manifold. We show how trajectories on the Grassmannian can be characterized using appropriate distance metrics and statistical methods that reflect the underlying geometry of the manifold. This results in more expressive and powerful models for complex human activities. We demonstrate the strength of the framework for activity-based summarization of long videos and recognition of complex human actions on two datasets.

人类活动分析是计算机视觉中的一个重要问题，在消费者内容的监控、总结和索引等方面有着广泛的应用。复杂的人类活动具有非线性动力学特征，这使得学习、推理和识别变得困难。本文研究了具有时变动力学特征的复杂活动的建模和识别问题。为此，我们将活动描述为参数随时间变化的线性动态系统(LDS)或时变线性动态系统(TV-LDS)的输出。我们讨论了这类模型的参数估计方法，假设参数是局部时不变的。然后，我们将LDS模型的空间表示为Grassmann流形。然后，将TV-LDS模型定义为Grassmann流形上的轨迹。我们展示了如何使用适当的距离度量和统计方法来表征格拉斯曼年的轨迹，这些方法反映了流形的基本几何形状。这将为复杂的人类活动提供更具表现力和更强大的模型。我们在两个数据集上展示了基于活动的长视频总结和复杂人类行为识别框架的强度。

{"title":"Locally time-invariant models of human activities using trajectories on the grassmannian","authors":"P. Turaga, R. Chellappa","doi":"10.1109/CVPR.2009.5206710","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206710","url":null,"abstract":"Human activity analysis is an important problem in computer vision with applications in surveillance and summarization and indexing of consumer content. Complex human activities are characterized by non-linear dynamics that make learning, inference and recognition hard. In this paper, we consider the problem of modeling and recognizing complex activities which exhibit time-varying dynamics. To this end, we describe activities as outputs of linear dynamic systems (LDS) whose parameters vary with time, or a time-varying linear dynamic system (TV-LDS). We discuss parameter estimation methods for this class of models by assuming that the parameters are locally time-invariant. Then, we represent the space of LDS models as a Grassmann manifold. Then, the TV-LDS model is defined as a trajectory on the Grassmann manifold. We show how trajectories on the Grassmannian can be characterized using appropriate distance metrics and statistical methods that reflect the underlying geometry of the manifold. This results in more expressive and powerful models for complex human activities. We demonstrate the strength of the framework for activity-based summarization of long videos and recognition of complex human actions on two datasets.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115315126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 IEEE Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀