{"title":"Computing 3D saliency from a 2D image","authors":"Sudarshan Ramenahalli, E. Niebur","doi":"10.1109/CISS.2013.6552297","DOIUrl":null,"url":null,"abstract":"A saliency map is a model of visual selective attention using purely bottom-up features of an image like color, intensity and orientation. Another bottom-up feature of visual input is depth, the distance between eye (or sensor) and objects in the visual field. In this report we study the effect of depth on saliency. Different from previous work, we do not use measured depth (disparity) information but, instead, compute a 3D depth map from the 2D image using a depth learning algorithm. This computed depth is then added as an additional feature channel to the 2D saliency map, and all feature channels are linearly combined with equal weights to obtain a 3-dimensional saliency map. We compare the efficacy of saliency maps (2D and 3D) in predicting human eye fixations using three different performance measures. The 3D saliency map outperforms its 2D counterpart in predicting human eye fixations on all measures. Perhaps surprisingly, our 3D saliency map computed from a 2D image performs better than an existing 3D saliency model that uses explicit depth information.","PeriodicalId":268095,"journal":{"name":"2013 47th Annual Conference on Information Sciences and Systems (CISS)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 47th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS.2013.6552297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
A saliency map is a model of visual selective attention using purely bottom-up features of an image like color, intensity and orientation. Another bottom-up feature of visual input is depth, the distance between eye (or sensor) and objects in the visual field. In this report we study the effect of depth on saliency. Different from previous work, we do not use measured depth (disparity) information but, instead, compute a 3D depth map from the 2D image using a depth learning algorithm. This computed depth is then added as an additional feature channel to the 2D saliency map, and all feature channels are linearly combined with equal weights to obtain a 3-dimensional saliency map. We compare the efficacy of saliency maps (2D and 3D) in predicting human eye fixations using three different performance measures. The 3D saliency map outperforms its 2D counterpart in predicting human eye fixations on all measures. Perhaps surprisingly, our 3D saliency map computed from a 2D image performs better than an existing 3D saliency model that uses explicit depth information.