N. Vetrekar, Ramachandra Raghavendra, A. Gaonkar, G. Naik, R. Gad
Face recognition has attained a greater importance in bio-metric authentication due to its non-intrusive property of identifying individuals at varying stand-off distance. Face recognition based on multi-spectral imaging has recently gained prime importance due to its ability to capture spatial and spectral information across the spectrum. Our first contribution in this paper is to use extended multi-spectral face recognition in two different age groups. The second contribution is to show empirically the performance of face recognition for two age groups. Thus, in this paper, we developed a multi-spectral imaging sensor to capture facial database for two different age groups (≤ 15years and ≥ 20years) at nine different spectral bands covering 530nm to 1000nm range. We then collected a new facial images corresponding to two different age groups comprises of 168 individuals. Extensive experimental evaluation is performed independently on two different age group databases using four different state-of-the-art face recognition algorithms. We evaluate the verification and identification rate across individual spectral bands and fused spectral band for two age groups. The obtained evaluation results shows higher recognition rate for age groups ≥ 20years than ≤ 15years, which indicates the variation in face recognition across the different age groups.
{"title":"Extended multi-spectral face recognition across two different age groups: an empirical study","authors":"N. Vetrekar, Ramachandra Raghavendra, A. Gaonkar, G. Naik, R. Gad","doi":"10.1145/3009977.3010026","DOIUrl":"https://doi.org/10.1145/3009977.3010026","url":null,"abstract":"Face recognition has attained a greater importance in bio-metric authentication due to its non-intrusive property of identifying individuals at varying stand-off distance. Face recognition based on multi-spectral imaging has recently gained prime importance due to its ability to capture spatial and spectral information across the spectrum. Our first contribution in this paper is to use extended multi-spectral face recognition in two different age groups. The second contribution is to show empirically the performance of face recognition for two age groups. Thus, in this paper, we developed a multi-spectral imaging sensor to capture facial database for two different age groups (≤ 15years and ≥ 20years) at nine different spectral bands covering 530nm to 1000nm range. We then collected a new facial images corresponding to two different age groups comprises of 168 individuals. Extensive experimental evaluation is performed independently on two different age group databases using four different state-of-the-art face recognition algorithms. We evaluate the verification and identification rate across individual spectral bands and fused spectral band for two age groups. The obtained evaluation results shows higher recognition rate for age groups ≥ 20years than ≤ 15years, which indicates the variation in face recognition across the different age groups.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"16 1","pages":"78:1-78:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86549944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently several multi-view learning-based methods have been proposed, and they are found to be more efficient in many real world applications. However, existing multi-view learning-based methods are not suitable for finding discriminative directions if the data is multi-modal. In such cases, Locality Preserving Projection (LPP) and/or Local Fisher Discriminant Analysis (LFDA) are found to be more appropriate to capture discriminative directions. Furthermore, existing methods show that imposing uncorrelated constraint onto the common space improves classification accuracy of the system. Hence inspired from the above findings, we propose an Un-correlated Multi-view Discriminant Locality Preserving Projection (UMvDLPP)-based approach. The proposed method searches a common uncorrelated discriminative space for multiple observable spaces. Moreover, the proposed method can also handle the multimodal characteristic, which is inherently embedded in multi-view facial expression recognition (FER) data. Hence, the proposed method is effectively more efficient for multi-view FER problem. Experimental results show that the proposed method outperforms state-of-the-art multi-view learning-based methods.
{"title":"Uncorrelated multiview discriminant locality preserving projection analysis for multiview facial expression recognition","authors":"Sunil Kumar, M. Bhuyan, B. Chakraborty","doi":"10.1145/3009977.3010056","DOIUrl":"https://doi.org/10.1145/3009977.3010056","url":null,"abstract":"Recently several multi-view learning-based methods have been proposed, and they are found to be more efficient in many real world applications. However, existing multi-view learning-based methods are not suitable for finding discriminative directions if the data is multi-modal. In such cases, Locality Preserving Projection (LPP) and/or Local Fisher Discriminant Analysis (LFDA) are found to be more appropriate to capture discriminative directions. Furthermore, existing methods show that imposing uncorrelated constraint onto the common space improves classification accuracy of the system. Hence inspired from the above findings, we propose an Un-correlated Multi-view Discriminant Locality Preserving Projection (UMvDLPP)-based approach. The proposed method searches a common uncorrelated discriminative space for multiple observable spaces. Moreover, the proposed method can also handle the multimodal characteristic, which is inherently embedded in multi-view facial expression recognition (FER) data. Hence, the proposed method is effectively more efficient for multi-view FER problem. Experimental results show that the proposed method outperforms state-of-the-art multi-view learning-based methods.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"1 1","pages":"86:1-86:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84120810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While capturing pictures by a simple camera in a scene with the presence of harsh or strong lighting like a full sunny day, we often find loss of highlight detail information (overexposure) in the bright regions and loss of shadow detail information (underexposure) in dark regions. In this manuscript, a classical method for retrieval of minute information from the high dynamic range image has been proposed. Our technique is based on variational calculus and dynamic stochastic resonance (DSR). We use a regularizer function, which has been added in order to optimise the correct estimation of the lost details from the overexposed or underexposed region of the image. We suppress the dynamic range of the luminance image by attenuating large gradient with the large magnitude and low gradient with low magnitude. At the same time, dynamic stochastic resonance (DSR) has been used to improve the underexposed region of the image. The experimental results of our proposed technique are capable of enhancing the quality of images in both overexposed and underexposed regions. The proposed technique is compared with most of the state-of-the-art techniques and it has been observed that the proposed technique is better or at most comparable to the existing techniques.
{"title":"Enhancement of high dynamic range images using variational calculus regularizer with stochastic resonance","authors":"Sumit Kumar, R. K. Jha","doi":"10.1145/3009977.3010039","DOIUrl":"https://doi.org/10.1145/3009977.3010039","url":null,"abstract":"While capturing pictures by a simple camera in a scene with the presence of harsh or strong lighting like a full sunny day, we often find loss of highlight detail information (overexposure) in the bright regions and loss of shadow detail information (underexposure) in dark regions. In this manuscript, a classical method for retrieval of minute information from the high dynamic range image has been proposed. Our technique is based on variational calculus and dynamic stochastic resonance (DSR). We use a regularizer function, which has been added in order to optimise the correct estimation of the lost details from the overexposed or underexposed region of the image. We suppress the dynamic range of the luminance image by attenuating large gradient with the large magnitude and low gradient with low magnitude. At the same time, dynamic stochastic resonance (DSR) has been used to improve the underexposed region of the image. The experimental results of our proposed technique are capable of enhancing the quality of images in both overexposed and underexposed regions. The proposed technique is compared with most of the state-of-the-art techniques and it has been observed that the proposed technique is better or at most comparable to the existing techniques.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"6 1","pages":"38:1-38:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87256210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skin colour detection under poor or varying illumination condition is a big challenge for various image processing and human-computer interaction applications. In this paper, a novel skin detection method utilizing image pixel distribution in a given colour space is proposed. The pixel distribution of an image can provide a better localization of the actual skin colour distribution of an image. Hence, a local skin distribution model (LSDM) is derived using the image pixel distribution model and its similarity with the global skin distribution model (GSDM). Finally, a fusion-based skin model is obtained using both the GSDM and the LSDM. Subsequently, a dynamic region growing method is employed to improve the overall detection rate. Experimental results show that proposed skin detection method can significantly improve the detection accuracy in presence of varying illumination conditions.
{"title":"Fusion-based skin detection using image distribution model","authors":"B. Chakraborty, M. Bhuyan, Sunil Kumar","doi":"10.1145/3009977.3010002","DOIUrl":"https://doi.org/10.1145/3009977.3010002","url":null,"abstract":"Skin colour detection under poor or varying illumination condition is a big challenge for various image processing and human-computer interaction applications. In this paper, a novel skin detection method utilizing image pixel distribution in a given colour space is proposed. The pixel distribution of an image can provide a better localization of the actual skin colour distribution of an image. Hence, a local skin distribution model (LSDM) is derived using the image pixel distribution model and its similarity with the global skin distribution model (GSDM). Finally, a fusion-based skin model is obtained using both the GSDM and the LSDM. Subsequently, a dynamic region growing method is employed to improve the overall detection rate. Experimental results show that proposed skin detection method can significantly improve the detection accuracy in presence of varying illumination conditions.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"41 4","pages":"67:1-67:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91479252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The objective of tracking is to determine the states of an object in video frames while maintaining appearance and motion consistency. In this paper, we propose a novel multi-part multi-feature (MPMF) based object tracking which falls in the category of part-based trackers. We represent a target by a set of fixed parts (not semantic as body parts such as limbs, face) and each part is represented by a set of features. The multi-part representation of the object aids in partial occlusion handling and the multi-feature based object description increases robustness of the target representation. Instead of considering all the features of the parts, we measure tracker's confidence for a candidate by utilizing only the strong features of the candidate. This ensures that weak features do not interfere in the decision making. We also present an automatic method for selecting this subset of appropriate features for each part. To increase the tracker's speed and to reduce the number of erroneous candidates, we do not search in the whole frame. We keep the size of search area adaptive that depends on the tracker's confidence for the predicted location of the object. Additionally, it is easy to integrate more parts and features to the proposed tracker. The results on various challenging videos from VOT dataset are encouraging. MPMF outperforms state-of-the-art trackers on some of the standard challenging videos.
{"title":"MPMF: multi-part multi-feature based object tracking","authors":"Neha Bhargava, S. Chaudhuri","doi":"10.1145/3009977.3010057","DOIUrl":"https://doi.org/10.1145/3009977.3010057","url":null,"abstract":"The objective of tracking is to determine the states of an object in video frames while maintaining appearance and motion consistency. In this paper, we propose a novel multi-part multi-feature (MPMF) based object tracking which falls in the category of part-based trackers. We represent a target by a set of fixed parts (not semantic as body parts such as limbs, face) and each part is represented by a set of features. The multi-part representation of the object aids in partial occlusion handling and the multi-feature based object description increases robustness of the target representation. Instead of considering all the features of the parts, we measure tracker's confidence for a candidate by utilizing only the strong features of the candidate. This ensures that weak features do not interfere in the decision making. We also present an automatic method for selecting this subset of appropriate features for each part. To increase the tracker's speed and to reduce the number of erroneous candidates, we do not search in the whole frame. We keep the size of search area adaptive that depends on the tracker's confidence for the predicted location of the object. Additionally, it is easy to integrate more parts and features to the proposed tracker. The results on various challenging videos from VOT dataset are encouraging. MPMF outperforms state-of-the-art trackers on some of the standard challenging videos.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"246 1","pages":"17:1-17:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76968945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kakanuru, Madan Kumar Rapuru, Deepak Mishra, R. S. S. Gorthi
Though visual object tracking algorithms are capable of handling various challenging scenarios individually, still none of them is robust enough to handle all the challenges simultaneously. This paper aims at proposing a novel robust tracking algorithm by elegantly fusing the frame level detection strategy of Tracking Learning & Detection (TLD) with systematic model update strategy of Kernelized Correlation Filter tracker (KCF). The motivation behind the selection of trackers is their complementary nature in handling tracking challenges. The proposed algorithm efficiently combines the two tracking algorithms based on conservative correspondence measure with strategic model updates, which takes advantages of both and outperforms them on their short-ends by the virtue of other. The proposed fusion approach is quite general and any complimentary tracker (not just KCF) can be fused with TLD to leverage the best performance. Extensive evaluation of the proposed method based on different metrics is carried out on the datasets ALOV300++, Online Tracking Benchmark (OTB) and Visual Object Tracking (VOT2015) and demonstrated its superiority in terms of robustness and success rate by comparing with state-of-the-art trackers.
{"title":"Complementary tracker's fusion for robust visual tracking","authors":"S. Kakanuru, Madan Kumar Rapuru, Deepak Mishra, R. S. S. Gorthi","doi":"10.1145/3009977.3010006","DOIUrl":"https://doi.org/10.1145/3009977.3010006","url":null,"abstract":"Though visual object tracking algorithms are capable of handling various challenging scenarios individually, still none of them is robust enough to handle all the challenges simultaneously. This paper aims at proposing a novel robust tracking algorithm by elegantly fusing the frame level detection strategy of Tracking Learning & Detection (TLD) with systematic model update strategy of Kernelized Correlation Filter tracker (KCF). The motivation behind the selection of trackers is their complementary nature in handling tracking challenges. The proposed algorithm efficiently combines the two tracking algorithms based on conservative correspondence measure with strategic model updates, which takes advantages of both and outperforms them on their short-ends by the virtue of other. The proposed fusion approach is quite general and any complimentary tracker (not just KCF) can be fused with TLD to leverage the best performance. Extensive evaluation of the proposed method based on different metrics is carried out on the datasets ALOV300++, Online Tracking Benchmark (OTB) and Visual Object Tracking (VOT2015) and demonstrated its superiority in terms of robustness and success rate by comparing with state-of-the-art trackers.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"7 41 1","pages":"51:1-51:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79628659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In quest for an efficient representation schema for activity recognition in video, we employ techniques combining diagrammatic reasoning (DR) with qualitative spatial and temporal reasoning (QSTR). QSTR allows qualitative abstraction of spatio-temporal relations among objects of interest; and is often thwart by ambiguous conclusions. 'Diagrams' influence cognitive reasoning by externalizing mental context. Hence, QSTR over diagrams holds promise. We define 'diagrams' as explicit representation of objects of interest and their spatial information on a 2D grid. A sequence of 'key diagrams' is extracted. Inter diagrammatic reasoning operators combine 'key diagrams' to obtain spatio-temporal information. The qualitative spatial and temporal information thus obtained define short-term activity (STA). Several STAs combine to form long-term activities (LTA). Sequence of STAs as a feature vector is used for LTA recognition. We evaluate our approach over six LTAs from the CAVIAR dataset.
{"title":"Qualitative spatial and temporal reasoning over diagrams for activity recognition","authors":"Chayanika Deka Nath, S. Hazarika","doi":"10.1145/3009977.3010015","DOIUrl":"https://doi.org/10.1145/3009977.3010015","url":null,"abstract":"In quest for an efficient representation schema for activity recognition in video, we employ techniques combining diagrammatic reasoning (DR) with qualitative spatial and temporal reasoning (QSTR). QSTR allows qualitative abstraction of spatio-temporal relations among objects of interest; and is often thwart by ambiguous conclusions. 'Diagrams' influence cognitive reasoning by externalizing mental context. Hence, QSTR over diagrams holds promise. We define 'diagrams' as explicit representation of objects of interest and their spatial information on a 2D grid. A sequence of 'key diagrams' is extracted. Inter diagrammatic reasoning operators combine 'key diagrams' to obtain spatio-temporal information. The qualitative spatial and temporal information thus obtained define short-term activity (STA). Several STAs combine to form long-term activities (LTA). Sequence of STAs as a feature vector is used for LTA recognition. We evaluate our approach over six LTAs from the CAVIAR dataset.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"25 1","pages":"72:1-72:6"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83521632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic recognition of important events in soccer broadcast videos plays a vital role in many applications including video summarization, indexing, content-based search, and in performance analysis of players and teams. This paper proposes an approach for soccer event recognition using deep convolutional features combined with domain-specific cues. For deep representation, we use the recently proposed trajectory based deep convolutional descriptor (TDD) [1] which samples and pools the discriminatively trained convolutional features around the improved trajectories. We further improve the performance by incorporating domain-specific knowledge based on camera view type and its position. The camera position and view type captures the statistics of occurrence of events in different play-field regions and zoom-level respectively. We conduct extensive experiments on 6 hour long soccer matches and show the effectiveness of deep video representation for soccer and the improvements obtained using domain-specific cues.
{"title":"Event recognition in broadcast soccer videos","authors":"Himangi Saraogi, R. Sharma, Vijay Kumar","doi":"10.1145/3009977.3010074","DOIUrl":"https://doi.org/10.1145/3009977.3010074","url":null,"abstract":"Automatic recognition of important events in soccer broadcast videos plays a vital role in many applications including video summarization, indexing, content-based search, and in performance analysis of players and teams. This paper proposes an approach for soccer event recognition using deep convolutional features combined with domain-specific cues. For deep representation, we use the recently proposed trajectory based deep convolutional descriptor (TDD) [1] which samples and pools the discriminatively trained convolutional features around the improved trajectories. We further improve the performance by incorporating domain-specific knowledge based on camera view type and its position. The camera position and view type captures the statistics of occurrence of events in different play-field regions and zoom-level respectively. We conduct extensive experiments on 6 hour long soccer matches and show the effectiveness of deep video representation for soccer and the improvements obtained using domain-specific cues.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"4 1","pages":"14:1-14:7"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81874845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ultrasound (US) guided intervention is a surgical procedure where the clinician makes use of imaging in realtime, to track the position of the needle, and correct its trajectory for accurately steering it to the lesion of interest. However, the needle is visible in the US image, only when aligned in-plane with the scanning plane of the US probe. In practice, clinicians often use a mechanical needle guide, thus restricting their available degrees of freedom in the US probe movement. Alternatively, during free-hand procedure, they use multiple needle punctures to achieve this in-plane positioning. Our present work details an augmented reality (AR) system for patient comfort centric aid to needle intervention through an overlaid visualization of the needle trajectory on the US frame prior to its insertion. This is implemented by continuous visual tracking of the US probe and the needle in 3D world coordinate system using fiducial markers. The tracked marker positions are used to draw the needle trajectory and tip visualized in realtime to augment on the US feed. Subsequently, the continuously tracked US probe and needle, and the navigation assistance information, would be overlaid with the visual feed from a head mounted display (HMD) for generating totally immersive AR experience for the clinician.
{"title":"Immersive augmented reality system for assisting needle positioning during ultrasound guided intervention","authors":"P. Kanithi, J. Chatterjee, D. Sheet","doi":"10.1145/3009977.3010023","DOIUrl":"https://doi.org/10.1145/3009977.3010023","url":null,"abstract":"Ultrasound (US) guided intervention is a surgical procedure where the clinician makes use of imaging in realtime, to track the position of the needle, and correct its trajectory for accurately steering it to the lesion of interest. However, the needle is visible in the US image, only when aligned in-plane with the scanning plane of the US probe. In practice, clinicians often use a mechanical needle guide, thus restricting their available degrees of freedom in the US probe movement. Alternatively, during free-hand procedure, they use multiple needle punctures to achieve this in-plane positioning. Our present work details an augmented reality (AR) system for patient comfort centric aid to needle intervention through an overlaid visualization of the needle trajectory on the US frame prior to its insertion. This is implemented by continuous visual tracking of the US probe and the needle in 3D world coordinate system using fiducial markers. The tracked marker positions are used to draw the needle trajectory and tip visualized in realtime to augment on the US feed. Subsequently, the continuously tracked US probe and needle, and the navigation assistance information, would be overlaid with the visual feed from a head mounted display (HMD) for generating totally immersive AR experience for the clinician.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"72 1","pages":"65:1-65:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81876073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, a suite of increasingly sophisticated methods have been developed to suppress additive noise from images. Most of these methods take advantage of sparsity of the underlying signal in a specific transform domain to achieve good visual or quantitative results. These methods apply relatively complex statistical modelling techniques to bifurcate the noise from the signal. In this paper, we demonstrate that a spatially adaptive Gaussian smoother could be a very effective solution to the image denoising problem. To derive the optimal parameter estimates for the Gaussian smoothening kernel, we derive and deploy a surrogate of the mean-squared error (MSE) risk similar to the Stein's estimator for Gaussian distributed noise. However, unlike the Stein's estimator or its counterparts for other noise distributions, the proposed generic risk estimator (GenRE) uses only first- and second-order moments of the noise distribution and is agnostic to the exact form of the noise distribution. By locally adapting the parameters of the Gaussian smoother, we obtain a denoising function that has a denoising performance (quantified by the peak signal-to-noise ratio (PSNR)) that is competitive to far more sophisticated methods reported in the literature. To avail the parallelism offered by the proposed method, we also provide a graphics processing unit (GPU) based implementation.
{"title":"How much can a Gaussian smoother denoise?","authors":"S. Gubbi, Ashutosh Gupta, C. Seelamantula","doi":"10.1145/3009977.3010027","DOIUrl":"https://doi.org/10.1145/3009977.3010027","url":null,"abstract":"Recently, a suite of increasingly sophisticated methods have been developed to suppress additive noise from images. Most of these methods take advantage of sparsity of the underlying signal in a specific transform domain to achieve good visual or quantitative results. These methods apply relatively complex statistical modelling techniques to bifurcate the noise from the signal. In this paper, we demonstrate that a spatially adaptive Gaussian smoother could be a very effective solution to the image denoising problem. To derive the optimal parameter estimates for the Gaussian smoothening kernel, we derive and deploy a surrogate of the mean-squared error (MSE) risk similar to the Stein's estimator for Gaussian distributed noise. However, unlike the Stein's estimator or its counterparts for other noise distributions, the proposed generic risk estimator (GenRE) uses only first- and second-order moments of the noise distribution and is agnostic to the exact form of the noise distribution. By locally adapting the parameters of the Gaussian smoother, we obtain a denoising function that has a denoising performance (quantified by the peak signal-to-noise ratio (PSNR)) that is competitive to far more sophisticated methods reported in the literature. To avail the parallelism offered by the proposed method, we also provide a graphics processing unit (GPU) based implementation.","PeriodicalId":93806,"journal":{"name":"Proceedings. Indian Conference on Computer Vision, Graphics & Image Processing","volume":"8 1","pages":"7:1-7:8"},"PeriodicalIF":0.0,"publicationDate":"2016-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78627775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}