Pub Date : 2018-05-01DOI: 10.1109/ICASSP.2018.8462250
Mohsen Ghassemi, A. Sarwate, Naveen Goela
Inductive matrix completion (IMC) is a model for incorporating side information in form of “features” of the row and column entities of an unknown matrix in the matrix completion problem. As side information, features can substantially reduce the number of observed entries required for reconstructing an unknown matrix from its given entries. The IMC problem can be formulated as a low-rank matrix recovery problem where the observed entries are seen as measurements of a smaller matrix that models the interaction between the column and row features. We take advantage of this property to study the optimization landscape of the factorized IMC problem. In particular, we show that the critical points of the objective function of this problem are either global minima that correspond to the true solution or are “escapable” saddle points. This result implies that any minimization algorithm with guaranteed convergence to a local minimum can be used for solving the factorized IMC problem.
{"title":"Global Optimality in Inductive Matrix Completion","authors":"Mohsen Ghassemi, A. Sarwate, Naveen Goela","doi":"10.1109/ICASSP.2018.8462250","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462250","url":null,"abstract":"Inductive matrix completion (IMC) is a model for incorporating side information in form of “features” of the row and column entities of an unknown matrix in the matrix completion problem. As side information, features can substantially reduce the number of observed entries required for reconstructing an unknown matrix from its given entries. The IMC problem can be formulated as a low-rank matrix recovery problem where the observed entries are seen as measurements of a smaller matrix that models the interaction between the column and row features. We take advantage of this property to study the optimization landscape of the factorized IMC problem. In particular, we show that the critical points of the objective function of this problem are either global minima that correspond to the true solution or are “escapable” saddle points. This result implies that any minimization algorithm with guaranteed convergence to a local minimum can be used for solving the factorized IMC problem.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"2226-2230"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87030175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/ICASSP.2018.8461967
Najmeh Sadoughi, C. Busso
Head movement is an integral part of face-to-face communications. It is important to investigate methodologies to generate naturalistic movements for conversational agents (CAs). The predominant method for head movement generation is using rules based on the meaning of the message. However, the variations of head movements by these methods are bounded by the predefined dictionary of gestures. Speech-driven methods offer an alternative approach, learning the relationship between speech and head movements from real recordings. However, previous studies do not generate novel realizations for a repeated speech signal. Conditional generative adversarial network (GAN) provides a framework to generate multiple realizations of head movements for each speech segment by sampling from a conditioned distribution. We build a conditional GAN with bidirectional long-short term memory (BLSTM), which is suitable for capturing the long-short term dependencies of time-continuous signals. This model learns the distribution of head movements conditioned on speech prosodic features. We compare this model with a dynamic Bayesian network (DBN) and BLSTM models optimized to reduce mean squared error (MSE) or to increase concordance correlation. The objective evaluations and subjective evaluations of the results showed better performance for the conditional GAN model compared with these baseline systems.
{"title":"Novel Realizations of Speech-Driven Head Movements with Generative Adversarial Networks","authors":"Najmeh Sadoughi, C. Busso","doi":"10.1109/ICASSP.2018.8461967","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461967","url":null,"abstract":"Head movement is an integral part of face-to-face communications. It is important to investigate methodologies to generate naturalistic movements for conversational agents (CAs). The predominant method for head movement generation is using rules based on the meaning of the message. However, the variations of head movements by these methods are bounded by the predefined dictionary of gestures. Speech-driven methods offer an alternative approach, learning the relationship between speech and head movements from real recordings. However, previous studies do not generate novel realizations for a repeated speech signal. Conditional generative adversarial network (GAN) provides a framework to generate multiple realizations of head movements for each speech segment by sampling from a conditioned distribution. We build a conditional GAN with bidirectional long-short term memory (BLSTM), which is suitable for capturing the long-short term dependencies of time-continuous signals. This model learns the distribution of head movements conditioned on speech prosodic features. We compare this model with a dynamic Bayesian network (DBN) and BLSTM models optimized to reduce mean squared error (MSE) or to increase concordance correlation. The objective evaluations and subjective evaluations of the results showed better performance for the conditional GAN model compared with these baseline systems.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"6169-6173"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85109420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-30DOI: 10.1109/ICASSP.2018.8462231
A. Bhandari, F. Krahmer, R. Raskar
In a recent paper [1], we introduced the concept of “Unlimited Sampling”. This unique approach circumvents the clipping or saturation problem in conventional analog-to-digital converters (ADCs) by considering a radically different ADC architecture which resets the input voltage before saturation. Such ADCs, also known as Self-Reset ADCs (SR-ADCs), allow for sensing modulo samples. In analogy to Shannon's sampling theorem, the unlimited sampling theorem proves that a bandlimited signal can be recovered from modulo samples provided that a certain sampling density criterion, that is independent of the ADC threshold, is satisfied. In this way, our result allows for perfect recovery of a bandlimited function whose amplitude exceeds the ADC threshold by orders of magnitude. By capitalizing on this result, in this paper, we consider the inverse problem of recovering a sparse signal from its low-pass filtered version. This problem frequently arises in several areas of science and engineering and in context of signal processing, it is studied in several flavors, namely, sparse or FRI sampling, super-resolution and sparse deconvolution. By considering the SR-ADC architecture, we develop a sampling theory for modulo sampling of lowpass filtered spikes. Our main result consists of a new sparse sampling theorem and an algorithm which stably recovers a $K$ -sparse signal from low-pass, modulo samples. We validate our results using numerical experiments.
{"title":"Unlimited Sampling of Sparse Signals","authors":"A. Bhandari, F. Krahmer, R. Raskar","doi":"10.1109/ICASSP.2018.8462231","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462231","url":null,"abstract":"In a recent paper [1], we introduced the concept of “Unlimited Sampling”. This unique approach circumvents the clipping or saturation problem in conventional analog-to-digital converters (ADCs) by considering a radically different ADC architecture which resets the input voltage before saturation. Such ADCs, also known as Self-Reset ADCs (SR-ADCs), allow for sensing modulo samples. In analogy to Shannon's sampling theorem, the unlimited sampling theorem proves that a bandlimited signal can be recovered from modulo samples provided that a certain sampling density criterion, that is independent of the ADC threshold, is satisfied. In this way, our result allows for perfect recovery of a bandlimited function whose amplitude exceeds the ADC threshold by orders of magnitude. By capitalizing on this result, in this paper, we consider the inverse problem of recovering a sparse signal from its low-pass filtered version. This problem frequently arises in several areas of science and engineering and in context of signal processing, it is studied in several flavors, namely, sparse or FRI sampling, super-resolution and sparse deconvolution. By considering the SR-ADC architecture, we develop a sampling theory for modulo sampling of lowpass filtered spikes. Our main result consists of a new sparse sampling theorem and an algorithm which stably recovers a $K$ -sparse signal from low-pass, modulo samples. We validate our results using numerical experiments.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"49 1","pages":"4569-4573"},"PeriodicalIF":0.0,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86844752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-30DOI: 10.1109/ICASSP.2018.8461302
Gauri Jagatap, Zhengyu Chen, C. Hegde, Namrata Vaswani
We consider the problem of super-resolution for sub-diffraction imaging. We adapt conventional Fourier ptychographic approaches, for the case where the images to be acquired have an underlying structured sparsity. We propose some sub-sampling strategies which can be easily adapted to existing ptychographic setups. We then use a novel technique called CoPRAM with some modifications, to recover sparse (and block sparse) images from sub-sampled pty-chographic measurements. We demonstrate experimentally that this algorithm performs better than existing phase retrieval techniques, in terms of quality of reconstruction, using fewer number of samples.
{"title":"Sub-Diffraction Imaging Using Fourier Ptychography and Structured Sparsity","authors":"Gauri Jagatap, Zhengyu Chen, C. Hegde, Namrata Vaswani","doi":"10.1109/ICASSP.2018.8461302","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461302","url":null,"abstract":"We consider the problem of super-resolution for sub-diffraction imaging. We adapt conventional Fourier ptychographic approaches, for the case where the images to be acquired have an underlying structured sparsity. We propose some sub-sampling strategies which can be easily adapted to existing ptychographic setups. We then use a novel technique called CoPRAM with some modifications, to recover sparse (and block sparse) images from sub-sampled pty-chographic measurements. We demonstrate experimentally that this algorithm performs better than existing phase retrieval techniques, in terms of quality of reconstruction, using fewer number of samples.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"71 1","pages":"6493-6497"},"PeriodicalIF":0.0,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73530120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-30DOI: 10.1109/ICASSP.2018.8461866
Mohammed Abdel-Wahab, C. Busso
Deep neural networks have been proven to be very effective in various classification problems and show great promise for emotion recognition from speech. Studies have proposed various architectures that further improve the performance of emotion recognition systems. However, there are still various open questions regarding the best approach to building a speech emotion recognition system. Would the system's performance improve if we have more labeled data? How much do we benefit from data augmentation? What activation and regularization schemes are more beneficial? How does the depth of the network affect the performance? We are collecting the MSP-Podcast corpus, a large dataset with over 30 hours of data, which provides an ideal resource to address these questions. This study explores various dense architectures to predict arousal, valence and dominance scores. We investigate varying the training set size, width, and depth of the network, as well as the activation functions used during training. We also study the effect of data augmentation on the network's performance. We find that bigger training set improves the performance. Batch normalization is crucial to achieving a good performance for deeper networks. We do not observe significant differences in the performance in residual networks compared to dense networks.
{"title":"Study of Dense Network Approaches for Speech Emotion Recognition","authors":"Mohammed Abdel-Wahab, C. Busso","doi":"10.1109/ICASSP.2018.8461866","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461866","url":null,"abstract":"Deep neural networks have been proven to be very effective in various classification problems and show great promise for emotion recognition from speech. Studies have proposed various architectures that further improve the performance of emotion recognition systems. However, there are still various open questions regarding the best approach to building a speech emotion recognition system. Would the system's performance improve if we have more labeled data? How much do we benefit from data augmentation? What activation and regularization schemes are more beneficial? How does the depth of the network affect the performance? We are collecting the MSP-Podcast corpus, a large dataset with over 30 hours of data, which provides an ideal resource to address these questions. This study explores various dense architectures to predict arousal, valence and dominance scores. We investigate varying the training set size, width, and depth of the network, as well as the activation functions used during training. We also study the effect of data augmentation on the network's performance. We find that bigger training set improves the performance. Batch normalization is crucial to achieving a good performance for deeper networks. We do not observe significant differences in the performance in residual networks compared to dense networks.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"59 1","pages":"5084-5088"},"PeriodicalIF":0.0,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79464478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-30DOI: 10.1109/ICASSP.2018.8462636
Matthew C. McCallum
Audio fingerprinting systems are often well designed to cope with a range of broadband noise types however they cope less well when presented with additive noise containing sinusoidal components. This is largely due to the fact that in a short-time signal representation (over periods of ≈ 20ms) these noise components are largely indistinguishable from salient components of the desirable signal that is to be fingerprinted. In this paper a front -end sinusoidal noise reduction procedure is introduced that is able to remove the most detrimental of the sinusoidal noise components thereby improving the audio fingerprinting system's performance. This is achievable by grouping short-time sinusoidal components into pitch contours via magnitude, frequency and phase characteristics, and identifying noisy contours as those with characteristics that are outliers in the distribution of all pitch contours in the signal. With this paper's contribution, the recognition rate in an industrial scale fingerprinting system is increased by up to 8.4%.
{"title":"Foreground Harmonic Noise Reduction for Robust Audio Fingerprinting","authors":"Matthew C. McCallum","doi":"10.1109/ICASSP.2018.8462636","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462636","url":null,"abstract":"Audio fingerprinting systems are often well designed to cope with a range of broadband noise types however they cope less well when presented with additive noise containing sinusoidal components. This is largely due to the fact that in a short-time signal representation (over periods of ≈ 20ms) these noise components are largely indistinguishable from salient components of the desirable signal that is to be fingerprinted. In this paper a front -end sinusoidal noise reduction procedure is introduced that is able to remove the most detrimental of the sinusoidal noise components thereby improving the audio fingerprinting system's performance. This is achievable by grouping short-time sinusoidal components into pitch contours via magnitude, frequency and phase characteristics, and identifying noisy contours as those with characteristics that are outliers in the distribution of all pitch contours in the signal. With this paper's contribution, the recognition rate in an industrial scale fingerprinting system is increased by up to 8.4%.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"15 2","pages":"3146-3150"},"PeriodicalIF":0.0,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91400852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-30DOI: 10.1109/ICASSP.2018.8462185
Reza Lotfidereshgi, P. Gournay
This paper proposes a novel approach for speech signal prediction based on a recurrent neural network (RNN). Unlike existing RNN-based predictors, which operate on parametric features and are trained offline on a large collection of such features, the proposed predictor operates directly on speech samples and is trained online on the recent past of the speech signal. Optionally, the network can be pre-trained offline to speed-up convergence at start-up. The proposed predictor is a single end-to-end network that captures all sorts of dependencies between samples, and therefore has the potential to outperform classicallinear/non-linear and short-termllong-term speech predictor structures. We apply it to the packet loss concealment (PLC) problem and show that it outperforms the standard ITU G.711 Appendix I PLC technique.
{"title":"Speech Prediction Using an Adaptive Recurrent Neural Network with Application to Packet Loss Concealment","authors":"Reza Lotfidereshgi, P. Gournay","doi":"10.1109/ICASSP.2018.8462185","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462185","url":null,"abstract":"This paper proposes a novel approach for speech signal prediction based on a recurrent neural network (RNN). Unlike existing RNN-based predictors, which operate on parametric features and are trained offline on a large collection of such features, the proposed predictor operates directly on speech samples and is trained online on the recent past of the speech signal. Optionally, the network can be pre-trained offline to speed-up convergence at start-up. The proposed predictor is a single end-to-end network that captures all sorts of dependencies between samples, and therefore has the potential to outperform classicallinear/non-linear and short-termllong-term speech predictor structures. We apply it to the packet loss concealment (PLC) problem and show that it outperforms the standard ITU G.711 Appendix I PLC technique.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"16 1","pages":"5394-5398"},"PeriodicalIF":0.0,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84175903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-29DOI: 10.1109/ICASSP.2018.8462470
G. Henter, Jaime Lorenzo-Trueba, Xin Wang, M. Kondo, J. Yamagishi
We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm “cyborg speech” as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quinphone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.
{"title":"Cyborg Speech: Deep Multilingual Speech Synthesis for Generating Segmental Foreign Accent with Natural Prosody","authors":"G. Henter, Jaime Lorenzo-Trueba, Xin Wang, M. Kondo, J. Yamagishi","doi":"10.1109/ICASSP.2018.8462470","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462470","url":null,"abstract":"We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm “cyborg speech” as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quinphone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"157 1","pages":"4799-4803"},"PeriodicalIF":0.0,"publicationDate":"2018-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82463412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-27DOI: 10.1109/ICASSP.2018.8462585
O. Mayer, M. Stamm
Information about an image's source camera model is important knowledge in many forensic investigations. In this paper we propose a system that compares two image patches to determine if they were captured by the same camera model. To do this, we first train a CNN based feature extractor to output generic, high level features which encode information about the source camera model of an image patch. Then, we learn a similarity measure that maps pairs of these features to a score indicating whether the two image patches were captured by the same or different camera models. We show that our proposed system accurately determines if two patches were captured by the same or different camera models, even when the camera models are unknown to the investigator. We also demonstrate the utility of this approach for image splicing detection and localization.
{"title":"Learned Forensic Source Similarity for Unknown Camera Models","authors":"O. Mayer, M. Stamm","doi":"10.1109/ICASSP.2018.8462585","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462585","url":null,"abstract":"Information about an image's source camera model is important knowledge in many forensic investigations. In this paper we propose a system that compares two image patches to determine if they were captured by the same camera model. To do this, we first train a CNN based feature extractor to output generic, high level features which encode information about the source camera model of an image patch. Then, we learn a similarity measure that maps pairs of these features to a score indicating whether the two image patches were captured by the same or different camera models. We show that our proposed system accurately determines if two patches were captured by the same or different camera models, even when the camera models are unknown to the investigator. We also demonstrate the utility of this approach for image splicing detection and localization.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"35 1","pages":"2012-2016"},"PeriodicalIF":0.0,"publicationDate":"2018-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76238280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-27DOI: 10.1109/ICASSP.2018.8461765
A. Ukil, S. Bandyopadhyay, Chetanya Puri, Rituraj Singh, A. Pal
In this paper, we present completely automated cardiac anomaly detection for remote screening of cardio-vascular abnormality using Phonocardiogram (PCG) or heart sound signal. Even though PCG contains significant and vital cardiac health information and cardiac abnormality signature, the presence of substantial noise does not guarantee highly effective analysis of cardiac condition. Our proposed method intelligently identifies and eliminates noisy PCG signal and consequently detects pathological abnormality condition. We further present a unified model of hybrid feature selection method. Our feature selection model is diversity optimized and cost-sensitive over conditional likelihood of the training and validation examples that maximizes classification model performance. We employ multi-stage hybrid feature selection process involving first level filter method and second level wrapper method. We achieve 85% detection accuracy by using publicly available MIT-Physionet challenge 2016 datasets consisting of more than 3000 annotated PCG signals.
{"title":"Effective Noise Removal and Unified Model of Hybrid Feature Space Optimization for Automated Cardiac Anomaly Detection Using Phonocardiogarm Signals","authors":"A. Ukil, S. Bandyopadhyay, Chetanya Puri, Rituraj Singh, A. Pal","doi":"10.1109/ICASSP.2018.8461765","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461765","url":null,"abstract":"In this paper, we present completely automated cardiac anomaly detection for remote screening of cardio-vascular abnormality using Phonocardiogram (PCG) or heart sound signal. Even though PCG contains significant and vital cardiac health information and cardiac abnormality signature, the presence of substantial noise does not guarantee highly effective analysis of cardiac condition. Our proposed method intelligently identifies and eliminates noisy PCG signal and consequently detects pathological abnormality condition. We further present a unified model of hybrid feature selection method. Our feature selection model is diversity optimized and cost-sensitive over conditional likelihood of the training and validation examples that maximizes classification model performance. We employ multi-stage hybrid feature selection process involving first level filter method and second level wrapper method. We achieve 85% detection accuracy by using publicly available MIT-Physionet challenge 2016 datasets consisting of more than 3000 annotated PCG signals.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"62 1","pages":"866-870"},"PeriodicalIF":0.0,"publicationDate":"2018-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74399479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}