Indoor Scene Recognition with a Visual Attention-Driven Spatial Pooling Strategy

2014 Canadian Conference on Computer and Robot Vision Pub Date : 2014-05-06 DOI:10.1109/CRV.2014.43

Tarek Elguebaly, N. Bouguila

{"title":"Indoor Scene Recognition with a Visual Attention-Driven Spatial Pooling Strategy","authors":"Tarek Elguebaly, N. Bouguila","doi":"10.1109/CRV.2014.43","DOIUrl":null,"url":null,"abstract":"Scene recognition is an important research topic in robotics and computer vision. Even though scene recognition is a problem that has been studied in depth, indoor scene categorization has had a slow progress. Indoor scene recognition is a challenging problem due to the severe high intra-class variability, mainly due to the intrinsic variety of objects that may be present, and inter-class similarities of man-made indoor structures. Therefore, most scene recognition techniques that work well for outdoor scenes demonstrate low performance on indoor scenes. Thus, in this paper, we present a simple, yet effective method for indoor scene recognition. Our approach can be illustrated as follows. First, we extract dense SIFT descriptors. Then, we combine a saliency-driven perceptual pooling with a simple spatial pooling scheme. Once the spatial and the saliency-driven encoding have been determined, we use vector quantization to compute histograms of local features from each sub-region. Later, the histograms from all sub-regions are concatenated together to generate the final representation of the image. Finally, a model based mixture classifier, which uses mixture models to characterize class densities, is applied. In order to address the problem of modeling non-Gaussian data which are largely present in our final representation of images, we use the generalized Gaussian mixture (GGM) which can be a good alternative to the Gaussian thanks to its shape flexibility. The learning of the proposed statistical model is carried out using the rival penalized expectation-maximization (RPEM) algorithm which is able to perform model selection and parameter learning together in a single step. Furthermore, we take into account the feature selection problem by determining a set of relevant features for each data cluster, so that we can speed up the used learning algorithm and get rid of noisy, redundant, or uninformative feature. To validate the proposed method we test it on the MIT indoor scenes data set.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Canadian Conference on Computer and Robot Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2014.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Scene recognition is an important research topic in robotics and computer vision. Even though scene recognition is a problem that has been studied in depth, indoor scene categorization has had a slow progress. Indoor scene recognition is a challenging problem due to the severe high intra-class variability, mainly due to the intrinsic variety of objects that may be present, and inter-class similarities of man-made indoor structures. Therefore, most scene recognition techniques that work well for outdoor scenes demonstrate low performance on indoor scenes. Thus, in this paper, we present a simple, yet effective method for indoor scene recognition. Our approach can be illustrated as follows. First, we extract dense SIFT descriptors. Then, we combine a saliency-driven perceptual pooling with a simple spatial pooling scheme. Once the spatial and the saliency-driven encoding have been determined, we use vector quantization to compute histograms of local features from each sub-region. Later, the histograms from all sub-regions are concatenated together to generate the final representation of the image. Finally, a model based mixture classifier, which uses mixture models to characterize class densities, is applied. In order to address the problem of modeling non-Gaussian data which are largely present in our final representation of images, we use the generalized Gaussian mixture (GGM) which can be a good alternative to the Gaussian thanks to its shape flexibility. The learning of the proposed statistical model is carried out using the rival penalized expectation-maximization (RPEM) algorithm which is able to perform model selection and parameter learning together in a single step. Furthermore, we take into account the feature selection problem by determining a set of relevant features for each data cluster, so that we can speed up the used learning algorithm and get rid of noisy, redundant, or uninformative feature. To validate the proposed method we test it on the MIT indoor scenes data set.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于视觉注意力驱动的空间池策略的室内场景识别

场景识别是机器人技术和计算机视觉领域的一个重要研究课题。尽管场景识别是一个已经深入研究的问题，但室内场景分类却进展缓慢。室内场景识别是一个具有挑战性的问题，主要是由于可能存在的物体的内在多样性和人工室内结构的类间相似性。因此，大多数场景识别技术在室外场景中表现良好，但在室内场景中表现不佳。因此，本文提出了一种简单而有效的室内场景识别方法。我们的方法可以说明如下。首先，提取密集SIFT描述子。然后，我们将显著性驱动的感知池与简单的空间池方案相结合。一旦确定了空间和显著性驱动的编码，我们使用矢量量化来计算每个子区域的局部特征直方图。然后，将所有子区域的直方图连接在一起以生成图像的最终表示。最后，提出了一种基于混合模型的混合分类器，该分类器利用混合模型来表征类密度。为了解决非高斯数据的建模问题，我们使用广义高斯混合(GGM)，由于其形状的灵活性，它可以成为高斯的一个很好的替代品。采用竞争惩罚期望最大化(RPEM)算法对所提出的统计模型进行学习，该算法能够在一步中同时完成模型选择和参数学习。此外，我们通过为每个数据簇确定一组相关特征来考虑特征选择问题，以便我们可以加快使用的学习算法并去除噪声，冗余或无信息的特征。为了验证所提出的方法，我们在MIT室内场景数据集上进行了测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 Canadian Conference on Computer and Robot Vision

自引率

0.00%

发文量