{"title":"Deriving high-level scene descriptions from deep scene CNN features","authors":"Akram Bayat, M. Pomplun","doi":"10.1109/IPTA.2017.8310111","DOIUrl":null,"url":null,"abstract":"In this paper, we generate two computational models in order to estimate two dominant global properties (naturalness and openness) for representing a scene based on its global spatial structure. Naturalness and openness are two dominant perceptual properties within a multidimensional space in which semantically similar scenes (e.g., corridor and hallway) are assigned to nearby points. In this model space, the representation of a real-world scene is based on the overall shape of a scene but not on local object information. We introduce the use of a deep convolutional neural network for generating features that are well-suited for estimating the two global properties of a visual scene. The extracted features are integrated in an efficient way and fed into a linear support vector machine (SVM) to classify naturalness versus man-madeness and openness versus closedness. These two global properties (naturalness and openness) of an input image can be predicted from activations in the lowest layer of the convolutional neural network which has been trained for a scene recognition task. The consistent results of computational models in full and restricted spatial frequency ranges suggest that the representation of an image in the lowest layer of the deep scene CNN contains holistic information of the images as it leads to highest accuracy in modelling the global shape of the scene.","PeriodicalId":316356,"journal":{"name":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA.2017.8310111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In this paper, we generate two computational models in order to estimate two dominant global properties (naturalness and openness) for representing a scene based on its global spatial structure. Naturalness and openness are two dominant perceptual properties within a multidimensional space in which semantically similar scenes (e.g., corridor and hallway) are assigned to nearby points. In this model space, the representation of a real-world scene is based on the overall shape of a scene but not on local object information. We introduce the use of a deep convolutional neural network for generating features that are well-suited for estimating the two global properties of a visual scene. The extracted features are integrated in an efficient way and fed into a linear support vector machine (SVM) to classify naturalness versus man-madeness and openness versus closedness. These two global properties (naturalness and openness) of an input image can be predicted from activations in the lowest layer of the convolutional neural network which has been trained for a scene recognition task. The consistent results of computational models in full and restricted spatial frequency ranges suggest that the representation of an image in the lowest layer of the deep scene CNN contains holistic information of the images as it leads to highest accuracy in modelling the global shape of the scene.