Cloud computing paradigm is attracting individuals as well as organizations all round the globe due to its multiple facilities of powerful resources, storage hubs, computational power and cost effective solutions. However, distributed cloud data centers pose the threat of security breaches to the privacy of the data and pose high risks. A secured image data sharing scheme over cloud domain based on the Shamir's secret sharing and permutation ordered binary number system has been proposed here. It distributes the image information into multiple random shares that reveal no information and thus can be stored securely over the distributed cloud data centers. Different image operations could be applied in the encrypted domain over the cloud servers itself. This approach reduces the threat to security which is a major hindrance in utilizing the facilities of the cloud based architecture. Only the authentic owner possessing the secret keys could restore back the original image information from the random shares. Comparative results for both the plain domain as well as the encrypted domain have been presented to validate the efficiency of using the various image filtering operations like motion blur, unsharp masking, weiner, gaussian etc in the encrypted domain over cloud.
{"title":"Don't see me, just filter me: towards secure cloud based filtering using Shamir's secret sharing and POB number system","authors":"Priyanka Singh, Nishant Agarwal, B. Raman","doi":"10.1145/3009977.3010036","DOIUrl":"https://doi.org/10.1145/3009977.3010036","url":null,"abstract":"Cloud computing paradigm is attracting individuals as well as organizations all round the globe due to its multiple facilities of powerful resources, storage hubs, computational power and cost effective solutions. However, distributed cloud data centers pose the threat of security breaches to the privacy of the data and pose high risks. A secured image data sharing scheme over cloud domain based on the Shamir's secret sharing and permutation ordered binary number system has been proposed here. It distributes the image information into multiple random shares that reveal no information and thus can be stored securely over the distributed cloud data centers. Different image operations could be applied in the encrypted domain over the cloud servers itself. This approach reduces the threat to security which is a major hindrance in utilizing the facilities of the cloud based architecture. Only the authentic owner possessing the secret keys could restore back the original image information from the random shares. Comparative results for both the plain domain as well as the encrypted domain have been presented to validate the efficiency of using the various image filtering operations like motion blur, unsharp masking, weiner, gaussian etc in the encrypted domain over cloud.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"20 1","pages":"12:1-12:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86250762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Senthilarasi M, Md Mansoor Roomi S, Sheik Naveedh A
Ripening treatment of banana is accomplished globally with controlled ethylene gas, temperature, airflow, humidity and time. During the ripening, the peel colour of banana changes from green to yellow with brown spots. The shelf life of banana has significant quality indices and colour transformations which impacts the banana characteristics like softness, sweet and taste. Therefore, an automatic control system can monitor the ripening level of bananas to maintain the peel colour, firm pulp and texture. Appropriate datasets are required for the experimentation and evaluation of the ripening level determination algorithms. This paper is intended to generate a database for Musa Species (yellow bananas) with different ripening levels such as unripe, ripe and overripe. Rasthali (Musa AAB) and Monthan (Musa ABB) hands are chosen as samples to create the database. MUSA database comprises of 3108 banana images which are acquired at 7 view angles and 12 rotations at a constant illumination. The supremacy of Musa database is tested with the state of art ripening level determination algorithms.
{"title":"MUSA: a banana database for ripening level determination","authors":"Senthilarasi M, Md Mansoor Roomi S, Sheik Naveedh A","doi":"10.1145/3009977.3009996","DOIUrl":"https://doi.org/10.1145/3009977.3009996","url":null,"abstract":"Ripening treatment of banana is accomplished globally with controlled ethylene gas, temperature, airflow, humidity and time. During the ripening, the peel colour of banana changes from green to yellow with brown spots. The shelf life of banana has significant quality indices and colour transformations which impacts the banana characteristics like softness, sweet and taste. Therefore, an automatic control system can monitor the ripening level of bananas to maintain the peel colour, firm pulp and texture. Appropriate datasets are required for the experimentation and evaluation of the ripening level determination algorithms. This paper is intended to generate a database for Musa Species (yellow bananas) with different ripening levels such as unripe, ripe and overripe. Rasthali (Musa AAB) and Monthan (Musa ABB) hands are chosen as samples to create the database. MUSA database comprises of 3108 banana images which are acquired at 7 view angles and 12 rotations at a constant illumination. The supremacy of Musa database is tested with the state of art ripening level determination algorithms.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"1 1","pages":"71:1-71:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86630148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danish Sodhi, Sarthak Upadhyay, Dhaivat Bhatt, K. Krishna, S. Swarup
Curb detection is a critical component of driver assistance and autonomous driving systems. In this paper, we present a discriminative approach to the problem of curb detection under diverse road conditions. We define curbs as the intersection of drivable and non-drivable area which are classified using dense Conditional random fields(CRF). In our method, we fuse output of a neural network used for pixel-wise semantic segmentation with depth and color information from stereo cameras. CRF fuses the output of a deep model and height information available in stereo data and provides improved segmentation. Further we introduce temporal smoothness using a weighted average of Segnet output and output from a probabilistic voxel grid as our unary potential. Finally, we show improvements over the current state of the art neural networks. Our proposed method shows accurate results over large range of variations in curb curvature and appearance, without the need of retraining the model for the specific dataset.
{"title":"CRF based method for curb detection using semantic cues and stereo depth","authors":"Danish Sodhi, Sarthak Upadhyay, Dhaivat Bhatt, K. Krishna, S. Swarup","doi":"10.1145/3009977.3010058","DOIUrl":"https://doi.org/10.1145/3009977.3010058","url":null,"abstract":"Curb detection is a critical component of driver assistance and autonomous driving systems. In this paper, we present a discriminative approach to the problem of curb detection under diverse road conditions. We define curbs as the intersection of drivable and non-drivable area which are classified using dense Conditional random fields(CRF). In our method, we fuse output of a neural network used for pixel-wise semantic segmentation with depth and color information from stereo cameras. CRF fuses the output of a deep model and height information available in stereo data and provides improved segmentation. Further we introduce temporal smoothness using a weighted average of Segnet output and output from a probabilistic voxel grid as our unary potential. Finally, we show improvements over the current state of the art neural networks. Our proposed method shows accurate results over large range of variations in curb curvature and appearance, without the need of retraining the model for the specific dataset.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"27 1","pages":"41:1-41:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83716481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the common image forgery techniques is the splicing, where parts from different images are copied and pasted onto a single image. This paper proposes a new forensics method for detecting splicing forgeries in images containing human faces. Our approach is based on extracting an illumination-signature from the faces of people present in an image using the dichromatic reflection model (DRM). The dichromatic plane histogram (DPH), which is calculated by applying the 2D Hough Transform on the face images, is used as the illumination-signature. The correlation measure is employed to compute the similarity between the DPHs obtained from different faces present in an image. Finally, a simple threshold on this similarity measure exposes splicing forgeries in the image. Experimental results show the efficacy of the proposed method.
{"title":"Exposing splicing forgeries in digital images through dichromatic plane histogram discrepancies","authors":"A. Mazumdar, P. Bora","doi":"10.1145/3009977.3010032","DOIUrl":"https://doi.org/10.1145/3009977.3010032","url":null,"abstract":"One of the common image forgery techniques is the splicing, where parts from different images are copied and pasted onto a single image. This paper proposes a new forensics method for detecting splicing forgeries in images containing human faces. Our approach is based on extracting an illumination-signature from the faces of people present in an image using the dichromatic reflection model (DRM). The dichromatic plane histogram (DPH), which is calculated by applying the 2D Hough Transform on the face images, is used as the illumination-signature. The correlation measure is employed to compute the similarity between the DPHs obtained from different faces present in an image. Finally, a simple threshold on this similarity measure exposes splicing forgeries in the image. Experimental results show the efficacy of the proposed method.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"126 1","pages":"62:1-62:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83722155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash-tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data-driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of ∼ 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.
{"title":"Learning to hash-tag videos with Tag2Vec","authors":"A. Singh, Saurabh Saini, R. Shah, P J Narayanan","doi":"10.1145/3009977.3010035","DOIUrl":"https://doi.org/10.1145/3009977.3010035","url":null,"abstract":"User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hash-tags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data-driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of ∼ 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"20 1","pages":"94:1-94:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78553642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Facial analysis is a key technology for enabling human-machine interaction. In this context, we present a client-server framework, where a client transmits the signature of a face to be analyzed to the server, and, in return, the server sends back various information describing the face e.g. is the person male or female, is she/he bald, does he have a mustache, etc. We assume that a client can compute one (or a combination) of visual features; from very simple and efficient features, like Local Binary Patterns, to more complex and computationally heavy, like Fisher Vectors and CNN based, depending on the computing resources available. The challenge addressed in this paper is to design a common universal representation such that a single merged signature is transmitted to the server, whatever be the type and number of features computed by the client, ensuring nonetheless an optimal performance. Our solution is based on learning of a common optimal subspace for aligning the different face features and merging them into a universal signature. We have validated the proposed method on the challenging CelebA dataset, on which our method outperforms existing state-of-art methods when rich representation is available at test time, while giving competitive performance when only simple signatures (like LBP) are available at test time due to resource constraints on the client.
{"title":"Deep fusion of visual signatures for client-server facial analysis","authors":"Binod Bhattarai, Gaurav Sharma, F. Jurie","doi":"10.1145/3009977.3010062","DOIUrl":"https://doi.org/10.1145/3009977.3010062","url":null,"abstract":"Facial analysis is a key technology for enabling human-machine interaction. In this context, we present a client-server framework, where a client transmits the signature of a face to be analyzed to the server, and, in return, the server sends back various information describing the face e.g. is the person male or female, is she/he bald, does he have a mustache, etc. We assume that a client can compute one (or a combination) of visual features; from very simple and efficient features, like Local Binary Patterns, to more complex and computationally heavy, like Fisher Vectors and CNN based, depending on the computing resources available. The challenge addressed in this paper is to design a common universal representation such that a single merged signature is transmitted to the server, whatever be the type and number of features computed by the client, ensuring nonetheless an optimal performance. Our solution is based on learning of a common optimal subspace for aligning the different face features and merging them into a universal signature. We have validated the proposed method on the challenging CelebA dataset, on which our method outperforms existing state-of-art methods when rich representation is available at test time, while giving competitive performance when only simple signatures (like LBP) are available at test time due to resource constraints on the client.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"60 1","pages":"42:1-42:8"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85538317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Poostchi, K. Palaniappan, F. Bunyak, G. Seetharaman
Motion detection using background modeling is a widely used technique in object tracking. To meet the demands of real-time multi-target tracking applications in large and/or high resolution imagery fast parallel algorithms for motion detection are desirable. One common method for background modeling is to use an adaptive 3D median filter that is updated appropriately based on the video sequence. We describe a parallel 3D spatiotemporal median filter algorithm implemented in CUDA for many core Graphics Processing Unit (GPU) architectures using the integral histogram as a building block to support adaptive window sizes. Both 2D and 3D median filters are also widely used in many other computer vision tasks like denoising, segmentation, and recognition. Although fast sequential median algorithms exist, improving performance using parallelization is attractive to reduce the time needed for motion detection in order to support more complex processing in multi-target tracking systems, large high resolution aerial video imagery and 3D volumetric processing. Results show the frame rate of the GPU implementation was 60 times faster than the CPU version for a 1K x 1K image reaching 49 fr/sec and 21 times faster for 512 x 512 frame sizes reaching 194 fr/sec. We characterize performance of the parallel 3D median filter for different image sizes and varying number of histogram bins and show selected results for motion detection.
基于背景建模的运动检测是一种广泛应用于目标跟踪的技术。为了满足大型和/或高分辨率图像中实时多目标跟踪应用的需求,需要快速并行运动检测算法。背景建模的一种常用方法是使用基于视频序列适当更新的自适应3D中值滤波器。我们描述了在CUDA中实现的并行3D时空中值滤波算法,用于许多核心图形处理单元(GPU)架构,使用积分直方图作为构建块来支持自适应窗口大小。2D和3D中值滤波器也广泛用于许多其他计算机视觉任务,如去噪,分割和识别。虽然已经存在快速顺序中值算法,但为了支持多目标跟踪系统、大型高分辨率航空视频图像和3D体积处理中更复杂的处理,使用并行化来提高性能以减少运动检测所需的时间是有吸引力的。结果表明,在1K x 1K图像达到49帧/秒时,GPU实现的帧率比CPU版本快60倍,在512 x 512帧大小达到194帧/秒时,GPU实现的帧率比CPU版本快21倍。我们描述了并行3D中值滤波器在不同图像大小和不同直方图箱数下的性能,并显示了用于运动检测的选择结果。
{"title":"Realtime motion detection based on the spatio-temporal median filter using GPU integral histograms","authors":"M. Poostchi, K. Palaniappan, F. Bunyak, G. Seetharaman","doi":"10.1145/2425333.2425352","DOIUrl":"https://doi.org/10.1145/2425333.2425352","url":null,"abstract":"Motion detection using background modeling is a widely used technique in object tracking. To meet the demands of real-time multi-target tracking applications in large and/or high resolution imagery fast parallel algorithms for motion detection are desirable. One common method for background modeling is to use an adaptive 3D median filter that is updated appropriately based on the video sequence. We describe a parallel 3D spatiotemporal median filter algorithm implemented in CUDA for many core Graphics Processing Unit (GPU) architectures using the integral histogram as a building block to support adaptive window sizes. Both 2D and 3D median filters are also widely used in many other computer vision tasks like denoising, segmentation, and recognition. Although fast sequential median algorithms exist, improving performance using parallelization is attractive to reduce the time needed for motion detection in order to support more complex processing in multi-target tracking systems, large high resolution aerial video imagery and 3D volumetric processing. Results show the frame rate of the GPU implementation was 60 times faster than the CPU version for a 1K x 1K image reaching 49 fr/sec and 21 times faster for 512 x 512 frame sizes reaching 194 fr/sec. We characterize performance of the parallel 3D median filter for different image sizes and varying number of histogram bins and show selected results for motion detection.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"23 1","pages":"19"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73442186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image fusion needs proper attention as images obtained from medical instruments are of poor contrast and corrupted by blur and noise due to imperfection of image capturing devices. Thus, objective evaluation of medical image fusion techniques has become an important task in noisy domain. Therefore, in the present work, we have proposed maximum selection and energy based fusion rules for the evaluation of noisy multimodal medical image fusion using Daubechies complex wavelet transform (DCxWT). Unlike, traditional real valued wavelet transforms, which suffered from shift sensitivity and did not provide any phase information, DCxWT is shift invariant and provides phase information through its imaginary coefficients. Shift invariance and availability of phase information properties of DCxWT have been found useful for fusion of multimodal medical images. The experiments have been performed over several set of noisy medical images at multiple levels of noise for the proposed fusion scheme. Further, the proposed fusion scheme has been tested up to the maximum level of Gaussian, salt & pepper and speckle noise. Objective evaluation of the proposed fusion scheme is performed with fusion factor, fusion symmetry, entropy, standard deviation and edge information metrics. Results have been shown for two sets of multimodal medical images for the proposed method with maximum and energy based fusion rules, and comparison has been done with Lifting wavelet transform (LWT) and Stationary wavelet transform (SWT) based fusion methods. Comparative analysis of the proposed method with LWT and SWT based fusion methods at different noise levels shows the superiority of the proposed scheme. Moreover, the plots of different fusion metrics against the maximum level of Gaussian, salt & pepper and speckle noise show the robustness of the proposed fusion method against noise.
{"title":"Objective evaluation of noisy multimodal medical image fusion using Daubechies complex wavelet transform","authors":"Rajiv Singh, A. Khare","doi":"10.1145/2425333.2425405","DOIUrl":"https://doi.org/10.1145/2425333.2425405","url":null,"abstract":"Medical image fusion needs proper attention as images obtained from medical instruments are of poor contrast and corrupted by blur and noise due to imperfection of image capturing devices. Thus, objective evaluation of medical image fusion techniques has become an important task in noisy domain. Therefore, in the present work, we have proposed maximum selection and energy based fusion rules for the evaluation of noisy multimodal medical image fusion using Daubechies complex wavelet transform (DCxWT). Unlike, traditional real valued wavelet transforms, which suffered from shift sensitivity and did not provide any phase information, DCxWT is shift invariant and provides phase information through its imaginary coefficients. Shift invariance and availability of phase information properties of DCxWT have been found useful for fusion of multimodal medical images. The experiments have been performed over several set of noisy medical images at multiple levels of noise for the proposed fusion scheme. Further, the proposed fusion scheme has been tested up to the maximum level of Gaussian, salt & pepper and speckle noise. Objective evaluation of the proposed fusion scheme is performed with fusion factor, fusion symmetry, entropy, standard deviation and edge information metrics. Results have been shown for two sets of multimodal medical images for the proposed method with maximum and energy based fusion rules, and comparison has been done with Lifting wavelet transform (LWT) and Stationary wavelet transform (SWT) based fusion methods. Comparative analysis of the proposed method with LWT and SWT based fusion methods at different noise levels shows the superiority of the proposed scheme. Moreover, the plots of different fusion metrics against the maximum level of Gaussian, salt & pepper and speckle noise show the robustness of the proposed fusion method against noise.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"217 1","pages":"72"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74172425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is a general trend in recent methods to use image regions (i.e. super-pixels) obtained in an unsupervised way to enhance the semantic image segmentation task. This paper proposes a detailed study on the role and the benefit of using these regions, at different steps of the segmentation process. For the purpose of this benchmark, we propose a simple system for semantic segmentation that uses a hierarchy of regions. A patch based system with similar settings is compared, which allows us to evaluate the contribution of each component of the system. Both systems are evaluated on the standard MSRC-21 dataset and obtain competitive results. We show that the proposed region based system can achieve good results without any complex regularization, while its patch based counterpart becomes competitive when using image prior and regularization methods. The latter benefit more from a CRF based regularization, yielding to state-of-the-art results with simple constraints based only on the leaf regions exploited in the pairwise potential.
{"title":"On the use of regions for semantic image segmentation","authors":"Rui Hu, Diane Larlus, G. Csurka","doi":"10.1145/2425333.2425384","DOIUrl":"https://doi.org/10.1145/2425333.2425384","url":null,"abstract":"There is a general trend in recent methods to use image regions (i.e. super-pixels) obtained in an unsupervised way to enhance the semantic image segmentation task. This paper proposes a detailed study on the role and the benefit of using these regions, at different steps of the segmentation process. For the purpose of this benchmark, we propose a simple system for semantic segmentation that uses a hierarchy of regions. A patch based system with similar settings is compared, which allows us to evaluate the contribution of each component of the system. Both systems are evaluated on the standard MSRC-21 dataset and obtain competitive results. We show that the proposed region based system can achieve good results without any complex regularization, while its patch based counterpart becomes competitive when using image prior and regularization methods. The latter benefit more from a CRF based regularization, yielding to state-of-the-art results with simple constraints based only on the leaf regions exploited in the pairwise potential.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"183 1","pages":"51"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77103893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A novel supervised technique for the generation of spatially consistent land cover maps based on class-matting is presented in this paper. This method takes advantage of both standard supervised classification technique and natural image matting. It adaptively exploits the spatial contextual information contained in the neighborhood of each pixel through the use of image matting to reduce the incongruence inherent in pixel-wise, radiometric classification of multi-spectral remote sensing data, providing a more spatially homogeneous land-cover map besides yielding a better accuracy. In order to make image matting possible for N-class land cover map generation, we extend the basic alpha matting problem into N independent matting problems, each conforming to one particular class. The user input required for the alpha matting algorithm in terms of initially identifying a few sample regions belonging to a particular class (known as the foreground object in matting) is obtained automatically using the supervised ML classifier. Experimental results obtained on multispectral data sets confirm the effectiveness of the proposed system.
{"title":"Matte based generation of land cover maps","authors":"K. Bahirat, S. Chaudhuri","doi":"10.1145/2425333.2425373","DOIUrl":"https://doi.org/10.1145/2425333.2425373","url":null,"abstract":"A novel supervised technique for the generation of spatially consistent land cover maps based on class-matting is presented in this paper. This method takes advantage of both standard supervised classification technique and natural image matting. It adaptively exploits the spatial contextual information contained in the neighborhood of each pixel through the use of image matting to reduce the incongruence inherent in pixel-wise, radiometric classification of multi-spectral remote sensing data, providing a more spatially homogeneous land-cover map besides yielding a better accuracy. In order to make image matting possible for N-class land cover map generation, we extend the basic alpha matting problem into N independent matting problems, each conforming to one particular class. The user input required for the alpha matting algorithm in terms of initially identifying a few sample regions belonging to a particular class (known as the foreground object in matting) is obtained automatically using the supervised ML classifier. Experimental results obtained on multispectral data sets confirm the effectiveness of the proposed system.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"266 1","pages":"40"},"PeriodicalIF":0.0,"publicationDate":"2012-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77165103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}