{"title":"Indoor Scene Recognition with a Visual Attention-Driven Spatial Pooling Strategy","authors":"Tarek Elguebaly, N. Bouguila","doi":"10.1109/CRV.2014.43","DOIUrl":null,"url":null,"abstract":"Scene recognition is an important research topic in robotics and computer vision. Even though scene recognition is a problem that has been studied in depth, indoor scene categorization has had a slow progress. Indoor scene recognition is a challenging problem due to the severe high intra-class variability, mainly due to the intrinsic variety of objects that may be present, and inter-class similarities of man-made indoor structures. Therefore, most scene recognition techniques that work well for outdoor scenes demonstrate low performance on indoor scenes. Thus, in this paper, we present a simple, yet effective method for indoor scene recognition. Our approach can be illustrated as follows. First, we extract dense SIFT descriptors. Then, we combine a saliency-driven perceptual pooling with a simple spatial pooling scheme. Once the spatial and the saliency-driven encoding have been determined, we use vector quantization to compute histograms of local features from each sub-region. Later, the histograms from all sub-regions are concatenated together to generate the final representation of the image. Finally, a model based mixture classifier, which uses mixture models to characterize class densities, is applied. In order to address the problem of modeling non-Gaussian data which are largely present in our final representation of images, we use the generalized Gaussian mixture (GGM) which can be a good alternative to the Gaussian thanks to its shape flexibility. The learning of the proposed statistical model is carried out using the rival penalized expectation-maximization (RPEM) algorithm which is able to perform model selection and parameter learning together in a single step. Furthermore, we take into account the feature selection problem by determining a set of relevant features for each data cluster, so that we can speed up the used learning algorithm and get rid of noisy, redundant, or uninformative feature. To validate the proposed method we test it on the MIT indoor scenes data set.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Canadian Conference on Computer and Robot Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2014.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Scene recognition is an important research topic in robotics and computer vision. Even though scene recognition is a problem that has been studied in depth, indoor scene categorization has had a slow progress. Indoor scene recognition is a challenging problem due to the severe high intra-class variability, mainly due to the intrinsic variety of objects that may be present, and inter-class similarities of man-made indoor structures. Therefore, most scene recognition techniques that work well for outdoor scenes demonstrate low performance on indoor scenes. Thus, in this paper, we present a simple, yet effective method for indoor scene recognition. Our approach can be illustrated as follows. First, we extract dense SIFT descriptors. Then, we combine a saliency-driven perceptual pooling with a simple spatial pooling scheme. Once the spatial and the saliency-driven encoding have been determined, we use vector quantization to compute histograms of local features from each sub-region. Later, the histograms from all sub-regions are concatenated together to generate the final representation of the image. Finally, a model based mixture classifier, which uses mixture models to characterize class densities, is applied. In order to address the problem of modeling non-Gaussian data which are largely present in our final representation of images, we use the generalized Gaussian mixture (GGM) which can be a good alternative to the Gaussian thanks to its shape flexibility. The learning of the proposed statistical model is carried out using the rival penalized expectation-maximization (RPEM) algorithm which is able to perform model selection and parameter learning together in a single step. Furthermore, we take into account the feature selection problem by determining a set of relevant features for each data cluster, so that we can speed up the used learning algorithm and get rid of noisy, redundant, or uninformative feature. To validate the proposed method we test it on the MIT indoor scenes data set.