Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206505
F. Perronnin, Yan Liu, J. Renders
We introduce a novel family of contextual measures of similarity between distributions: the similarity between two distributions q and p is measured in the context of a third distribution u. In our framework any traditional measure of similarity / dissimilarity has its contextual counterpart. We show that for two important families of divergences (Bregman and Csisz'ar), the contextual similarity computation consists in solving a convex optimization problem. We focus on the case of multinomials and explain how to compute in practice the similarity for several well-known measures. These contextual measures are then applied to the image retrieval problem. In such a case, the context u is estimated from the neighbors of a query q. One of the main benefits of our approach lies in the fact that using different contexts, and especially contexts at multiple scales (i.e. broad and narrow contexts), provides different views on the same problem. Combining the different views can improve retrieval accuracy. We will show on two very different datasets (one of photographs, the other of document images) that the proposed measures have a relatively small positive impact on macro Average Precision (which measures purely ranking) and a large positive impact on micro Average Precision (which measures both ranking and consistency of the scores across multiple queries).
{"title":"A family of contextual measures of similarity between distributions with application to image retrieval","authors":"F. Perronnin, Yan Liu, J. Renders","doi":"10.1109/CVPR.2009.5206505","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206505","url":null,"abstract":"We introduce a novel family of contextual measures of similarity between distributions: the similarity between two distributions q and p is measured in the context of a third distribution u. In our framework any traditional measure of similarity / dissimilarity has its contextual counterpart. We show that for two important families of divergences (Bregman and Csisz'ar), the contextual similarity computation consists in solving a convex optimization problem. We focus on the case of multinomials and explain how to compute in practice the similarity for several well-known measures. These contextual measures are then applied to the image retrieval problem. In such a case, the context u is estimated from the neighbors of a query q. One of the main benefits of our approach lies in the fact that using different contexts, and especially contexts at multiple scales (i.e. broad and narrow contexts), provides different views on the same problem. Combining the different views can improve retrieval accuracy. We will show on two very different datasets (one of photographs, the other of document images) that the proposed measures have a relatively small positive impact on macro Average Precision (which measures purely ranking) and a large positive impact on micro Average Precision (which measures both ranking and consistency of the scores across multiple queries).","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123443148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206671
Junsong Yuan, Zicheng Liu, Ying Wu
Actions are spatio-temporal patterns which can be characterized by collections of spatio-temporal invariant features. Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns. This paper addresses two critical issues in pattern matching-based action detection: (1) efficiency of pattern search in 3D videos and (2) tolerance of intra-pattern variations of actions. Our contributions are two-fold. First, we propose a discriminative pattern matching called naive-Bayes based mutual information maximization (NBMIM) for multi-class action categorization. It improves the state-of-the-art results on standard KTH dataset. Second, a novel search algorithm is proposed to locate the optimal subvolume in the 3D video space for efficient action detection. Our method is purely data-driven and does not rely on object detection, tracking or background subtraction. It can well handle the intra-pattern variations of actions such as scale and speed variations, and is insensitive to dynamic and clutter backgrounds and even partial occlusions. The experiments on versatile datasets including KTH and CMU action datasets demonstrate the effectiveness and efficiency of our method.
{"title":"Discriminative subvolume search for efficient action detection","authors":"Junsong Yuan, Zicheng Liu, Ying Wu","doi":"10.1109/CVPR.2009.5206671","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206671","url":null,"abstract":"Actions are spatio-temporal patterns which can be characterized by collections of spatio-temporal invariant features. Detection of actions is to find the re-occurrences (e.g. through pattern matching) of such spatio-temporal patterns. This paper addresses two critical issues in pattern matching-based action detection: (1) efficiency of pattern search in 3D videos and (2) tolerance of intra-pattern variations of actions. Our contributions are two-fold. First, we propose a discriminative pattern matching called naive-Bayes based mutual information maximization (NBMIM) for multi-class action categorization. It improves the state-of-the-art results on standard KTH dataset. Second, a novel search algorithm is proposed to locate the optimal subvolume in the 3D video space for efficient action detection. Our method is purely data-driven and does not rely on object detection, tracking or background subtraction. It can well handle the intra-pattern variations of actions such as scale and speed variations, and is insensitive to dynamic and clutter backgrounds and even partial occlusions. The experiments on versatile datasets including KTH and CMU action datasets demonstrate the effectiveness and efficiency of our method.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121698973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206524
Amelio Vázquez Reina, E. Miller, H. Pfister
The ability to constrain the geometry of deformable models for image segmentation can be useful when information about the expected shape or positioning of the objects in a scene is known a priori. An example of this occurs when segmenting neural cross sections in electron microscopy. Such images often contain multiple nested boundaries separating regions of homogeneous intensities. For these applications, multiphase level sets provide a partitioning framework that allows for the segmentation of multiple deformable objects by combining several level set functions. Although there has been much effort in the study of statistical shape priors that can be used to constrain the geometry of each partition, none of these methods allow for the direct modeling of geometric arrangements of partitions. In this paper, we show how to define elastic couplings between multiple level set functions to model ribbon-like partitions. We build such couplings using dynamic force fields that can depend on the image content and relative location and shape of the level set functions. To the best of our knowledge, this is the first work that shows a direct way of geometrically constraining multiphase level sets for image segmentation. We demonstrate the robustness of our method by comparing it with previous level set segmentation methods.
{"title":"Multiphase geometric couplings for the segmentation of neural processes","authors":"Amelio Vázquez Reina, E. Miller, H. Pfister","doi":"10.1109/CVPR.2009.5206524","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206524","url":null,"abstract":"The ability to constrain the geometry of deformable models for image segmentation can be useful when information about the expected shape or positioning of the objects in a scene is known a priori. An example of this occurs when segmenting neural cross sections in electron microscopy. Such images often contain multiple nested boundaries separating regions of homogeneous intensities. For these applications, multiphase level sets provide a partitioning framework that allows for the segmentation of multiple deformable objects by combining several level set functions. Although there has been much effort in the study of statistical shape priors that can be used to constrain the geometry of each partition, none of these methods allow for the direct modeling of geometric arrangements of partitions. In this paper, we show how to define elastic couplings between multiple level set functions to model ribbon-like partitions. We build such couplings using dynamic force fields that can depend on the image content and relative location and shape of the level set functions. To the best of our knowledge, this is the first work that shows a direct way of geometrically constraining multiphase level sets for image segmentation. We demonstrate the robustness of our method by comparing it with previous level set segmentation methods.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206607
Peng Li, S. Prince
Contemporary face recognition algorithms rely on precise localization of keypoints (corner of eye, nose etc.). Unfortunately, finding keypoints reliably and accurately remains a hard problem. In this paper we pose two questions. First, is it possible to exploit the gallery image in order to find keypoints in the probe image? For instance, consider finding the left eye in the probe image. Rather than using a generic eye model, we use a model that is informed by the appearance of the eye in the gallery image. To this end we develop a probabilistic model which combines recognition and keypoint localization. Second, is it necessary to localize keypoints? Alternatively we can consider keypoint position as a hidden variable which we marginalize over in a Bayesian manner. We demonstrate that both of these innovations improve performance relative to conventional methods in both frontal and cross-pose face recognition.
{"title":"Joint and implicit registration for face recognition","authors":"Peng Li, S. Prince","doi":"10.1109/CVPR.2009.5206607","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206607","url":null,"abstract":"Contemporary face recognition algorithms rely on precise localization of keypoints (corner of eye, nose etc.). Unfortunately, finding keypoints reliably and accurately remains a hard problem. In this paper we pose two questions. First, is it possible to exploit the gallery image in order to find keypoints in the probe image? For instance, consider finding the left eye in the probe image. Rather than using a generic eye model, we use a model that is informed by the appearance of the eye in the gallery image. To this end we develop a probabilistic model which combines recognition and keypoint localization. Second, is it necessary to localize keypoints? Alternatively we can consider keypoint position as a hidden variable which we marginalize over in a Bayesian manner. We demonstrate that both of these innovations improve performance relative to conventional methods in both frontal and cross-pose face recognition.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124567460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206548
Hongsheng Li, Tian Shen, Xiaolei Huang
In this paper, we introduce a novel algorithm to solve global shape registration problems. We use gray-scale “images” to represent source shapes, and propose a novel two-component Gaussian Mixtures (GM) distance map representation for target shapes. Based on this flexible asymmetric image-based representation, a new energy function is defined. It proves to be a more robust shape dissimilarity metric that can be computed efficiently. Such high efficiency is essential for global optimization methods. We adopt one of them, the Particle Swarm Optimization (PSO), to effectively estimate the global optimum of the new energy function. Experiments and comparison performed on generalized shape data including continuous shapes, unstructured sparse point sets, and gradient maps, demonstrate the robustness and effectiveness of the algorithm.
{"title":"Global optimization for alignment of generalized shapes","authors":"Hongsheng Li, Tian Shen, Xiaolei Huang","doi":"10.1109/CVPR.2009.5206548","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206548","url":null,"abstract":"In this paper, we introduce a novel algorithm to solve global shape registration problems. We use gray-scale “images” to represent source shapes, and propose a novel two-component Gaussian Mixtures (GM) distance map representation for target shapes. Based on this flexible asymmetric image-based representation, a new energy function is defined. It proves to be a more robust shape dissimilarity metric that can be computed efficiently. Such high efficiency is essential for global optimization methods. We adopt one of them, the Particle Swarm Optimization (PSO), to effectively estimate the global optimum of the new energy function. Experiments and comparison performed on generalized shape data including continuous shapes, unstructured sparse point sets, and gradient maps, demonstrate the robustness and effectiveness of the algorithm.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124595576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206655
Zhouyu Fu, A. Robles-Kelly
Multiple-instance learning (MIL) is a new paradigm of supervised learning that deals with the classification of bags. Each bag is presented as a collection of instances from which features are extracted. In MIL, we have usually confronted with a large instance space for even moderately sized data sets since each bag may contain many instances. Hence it is important to design efficient instance pruning and selection techniques to speed up the learning process without compromising on the performance. In this paper, we address the issue of instance selection in multiple instance learning and propose the IS-MIL, an instance selection framework for MIL, to tackle large-scale MIL problems. IS-MIL is based on an alternative optimisation framework by iteratively repeating the steps of instance selection/updating and classifier learning, which is guaranteed to converge. Experimental results demonstrate the utility and efficiency of the proposed approach compared to the alternatives.
{"title":"An instance selection approach to Multiple instance Learning","authors":"Zhouyu Fu, A. Robles-Kelly","doi":"10.1109/CVPR.2009.5206655","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206655","url":null,"abstract":"Multiple-instance learning (MIL) is a new paradigm of supervised learning that deals with the classification of bags. Each bag is presented as a collection of instances from which features are extracted. In MIL, we have usually confronted with a large instance space for even moderately sized data sets since each bag may contain many instances. Hence it is important to design efficient instance pruning and selection techniques to speed up the learning process without compromising on the performance. In this paper, we address the issue of instance selection in multiple instance learning and propose the IS-MIL, an instance selection framework for MIL, to tackle large-scale MIL problems. IS-MIL is based on an alternative optimisation framework by iteratively repeating the steps of instance selection/updating and classifier learning, which is guaranteed to converge. Experimental results demonstrate the utility and efficiency of the proposed approach compared to the alternatives.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123316711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206570
Yu Sun, B. Bhanu
Symmetry is an important cue for machine perception that involves high-level knowledge of image components. Unlike most of the previous research that only computes symmetry in an image, this paper integrates symmetry with image segmentation to improve the segmentation performance. The symmetry integration is used to optimize both the segmentation and the symmetry of regions simultaneously. Interesting points are initially extracted from an image and they are further refined for detecting symmetry axis. A symmetry affinity matrix is used explicitly as a constraint in a region growing algorithm in order to refine the symmetry of segmented regions. Experimental results and comparisons from a wide domain of images indicate a promising improvement by symmetry integrated image segmentation compared to other image segmentation methods that do not exploit symmetry.
{"title":"Symmetry integrated region-based image segmentation","authors":"Yu Sun, B. Bhanu","doi":"10.1109/CVPR.2009.5206570","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206570","url":null,"abstract":"Symmetry is an important cue for machine perception that involves high-level knowledge of image components. Unlike most of the previous research that only computes symmetry in an image, this paper integrates symmetry with image segmentation to improve the segmentation performance. The symmetry integration is used to optimize both the segmentation and the symmetry of regions simultaneously. Interesting points are initially extracted from an image and they are further refined for detecting symmetry axis. A symmetry affinity matrix is used explicitly as a constraint in a region growing algorithm in order to refine the symmetry of segmented regions. Experimental results and comparisons from a wide domain of images indicate a promising improvement by symmetry integrated image segmentation compared to other image segmentation methods that do not exploit symmetry.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206536
Ce Liu, Jenny Yuen, A. Torralba
In this paper we propose a novel nonparametric approach for object recognition and scene parsing using dense scene alignment. Given an input image, we retrieve its best matches from a large database with annotated images using our modified, coarse-to-fine SIFT flow algorithm that aligns the structures within two images. Based on the dense scene correspondence obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on a challenging database. Compared to existing object recognition approaches that require training for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.
{"title":"Nonparametric scene parsing: Label transfer via dense scene alignment","authors":"Ce Liu, Jenny Yuen, A. Torralba","doi":"10.1109/CVPR.2009.5206536","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206536","url":null,"abstract":"In this paper we propose a novel nonparametric approach for object recognition and scene parsing using dense scene alignment. Given an input image, we retrieve its best matches from a large database with annotated images using our modified, coarse-to-fine SIFT flow algorithm that aligns the structures within two images. Based on the dense scene correspondence obtained from the SIFT flow, our system warps the existing annotations, and integrates multiple cues in a Markov random field framework to segment and recognize the query image. Promising experimental results have been achieved by our nonparametric scene parsing system on a challenging database. Compared to existing object recognition approaches that require training for each object category, our system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206737
Boris Babenko, Ming-Hsuan Yang, Serge J. Belongie
In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called “tracking by detection” have been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrades the classifier and can cause further drift. In this paper we show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks. We present a novel online MIL algorithm for object tracking that achieves superior results with real-time performance.
{"title":"Visual tracking with online Multiple Instance Learning","authors":"Boris Babenko, Ming-Hsuan Yang, Serge J. Belongie","doi":"10.1109/CVPR.2009.5206737","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206737","url":null,"abstract":"In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called “tracking by detection” have been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrades the classifier and can cause further drift. In this paper we show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks. We present a novel online MIL algorithm for object tracking that achieves superior results with real-time performance.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128490070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2009-06-20DOI: 10.1109/CVPR.2009.5206577
Mohammad Norouzi, Mani Ranjbar, Greg Mori
In this paper we present a method for learning class-specific features for recognition. Recently a greedy layer-wise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate restricted Boltzmann machine (RBM). We develop the convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state of the art on handwritten digit recognition and pedestrian detection.
{"title":"Stacks of convolutional Restricted Boltzmann Machines for shift-invariant feature learning","authors":"Mohammad Norouzi, Mani Ranjbar, Greg Mori","doi":"10.1109/CVPR.2009.5206577","DOIUrl":"https://doi.org/10.1109/CVPR.2009.5206577","url":null,"abstract":"In this paper we present a method for learning class-specific features for recognition. Recently a greedy layer-wise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate restricted Boltzmann machine (RBM). We develop the convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state of the art on handwritten digit recognition and pedestrian detection.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}