Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168134
Mariia Dmitrieva, Matias Valdenegro-Toro, K. Brown, G. Heald, D. Lane
This paper presents classification of spherical objects with different physical properties. The classification is based on the energy distribution in wideband pulses that have been scattered from objects. The echo is represented in Time-Frequency Domain (TFD), using Short Time Fourier Transform (STFT) with different window lengths, and is fed into a Convolution Neural Network (CNN) for classification. The results for different window lengths are analysed to study the influence of time and frequency resolution in classification. The CNN performs the best results with accuracy of (98.44 ± 0.8)% over 5 object classes trained on grayscale TFD images with 0.1 ms window length of STFT. The CNN is compared with a Multilayer Perceptron classifier, Support Vector Machine, and Gradient Boosting.
{"title":"Object classification with convolution neural network based on the time-frequency representation of their echo","authors":"Mariia Dmitrieva, Matias Valdenegro-Toro, K. Brown, G. Heald, D. Lane","doi":"10.1109/MLSP.2017.8168134","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168134","url":null,"abstract":"This paper presents classification of spherical objects with different physical properties. The classification is based on the energy distribution in wideband pulses that have been scattered from objects. The echo is represented in Time-Frequency Domain (TFD), using Short Time Fourier Transform (STFT) with different window lengths, and is fed into a Convolution Neural Network (CNN) for classification. The results for different window lengths are analysed to study the influence of time and frequency resolution in classification. The CNN performs the best results with accuracy of (98.44 ± 0.8)% over 5 object classes trained on grayscale TFD images with 0.1 ms window length of STFT. The CNN is compared with a Multilayer Perceptron classifier, Support Vector Machine, and Gradient Boosting.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"2 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76796677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168174
Andros Tjandra, S. Sakti, Satoshi Nakamura
This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.
{"title":"Speech recognition features based on deep latent Gaussian models","authors":"Andros Tjandra, S. Sakti, Satoshi Nakamura","doi":"10.1109/MLSP.2017.8168174","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168174","url":null,"abstract":"This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features for a deep neural network (DNN) acoustic model. Here we compare our results with and without features transformed by DLGM and also observe the benefits of combining both the proposed and original features into a single DNN. Our experimental results show that the proposed features using DLGM improved the ASR performance. Furthermore, the DNN acoustic model, which combined the proposed and original features, gave the best performances.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"54 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75690678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168126
Kimmo Suotsalo, S. Särkkä
This paper proposes a linear stochastic state space model for electrocardiogram signal processing and analysis. The model is obtained as a discretized version of Wiener process acceleration model. The model is combined with a fixed-lag Rauch-Tung-Striebel smoother to perform on-line signal denoising, feature extraction, and beat classification. The results indicate that the proposed approach outperforms a conventional FIR filter in terms of improved signal-to-noise ratio, and that the approach can be used for highly accurate online classification of normal beats and premature ventricular contractions. The benefits of the model include the possibility to use closed-form solutions to the optimal filtering and smoothing problems, quick adaptation to sudden changes in beat morphology and heart rate, simple and fast initialization, preprocessing-free operation, intuitive interpretation of the system state, and more.
{"title":"A linear stochastic state space model for electrocardiograms","authors":"Kimmo Suotsalo, S. Särkkä","doi":"10.1109/MLSP.2017.8168126","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168126","url":null,"abstract":"This paper proposes a linear stochastic state space model for electrocardiogram signal processing and analysis. The model is obtained as a discretized version of Wiener process acceleration model. The model is combined with a fixed-lag Rauch-Tung-Striebel smoother to perform on-line signal denoising, feature extraction, and beat classification. The results indicate that the proposed approach outperforms a conventional FIR filter in terms of improved signal-to-noise ratio, and that the approach can be used for highly accurate online classification of normal beats and premature ventricular contractions. The benefits of the model include the possibility to use closed-form solutions to the optimal filtering and smoothing problems, quick adaptation to sudden changes in beat morphology and heart rate, simple and fast initialization, preprocessing-free operation, intuitive interpretation of the system state, and more.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"180 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74373316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168166
Y. X. Lukic, Carlo Vogt, Oliver Durr, Thilo Stadelmann
Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices — namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.
{"title":"Learning embeddings for speaker clustering based on voice equality","authors":"Y. X. Lukic, Carlo Vogt, Oliver Durr, Thilo Stadelmann","doi":"10.1109/MLSP.2017.8168166","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168166","url":null,"abstract":"Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices — namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"347 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77784383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168193
Alison O'Shea, G. Lightbody, G. Boylan, A. Temko
This study presents a novel end-to-end architecture that learns hierarchical representations from raw EEG data using fully convolutional deep neural networks for the task of neonatal seizure detection. The deep neural network acts as both feature extractor and classifier, allowing for end-to-end optimization of the seizure detector. The designed system is evaluated on a large dataset of continuous unedited multichannel neonatal EEG totaling 835 hours and comprising of 1389 seizures. The proposed deep architecture, with sample-level filters, achieves an accuracy that is comparable to the state-of-the-art SVM-based neonatal seizure detector, which operates on a set of carefully designed hand-crafted features. The fully convolutional architecture allows for the localization of EEG waveforms and patterns that result in high seizure probabilities for further clinical examination.
{"title":"Neonatal seizure detection using convolutional neural networks","authors":"Alison O'Shea, G. Lightbody, G. Boylan, A. Temko","doi":"10.1109/MLSP.2017.8168193","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168193","url":null,"abstract":"This study presents a novel end-to-end architecture that learns hierarchical representations from raw EEG data using fully convolutional deep neural networks for the task of neonatal seizure detection. The deep neural network acts as both feature extractor and classifier, allowing for end-to-end optimization of the seizure detector. The designed system is evaluated on a large dataset of continuous unedited multichannel neonatal EEG totaling 835 hours and comprising of 1389 seizures. The proposed deep architecture, with sample-level filters, achieves an accuracy that is comparable to the state-of-the-art SVM-based neonatal seizure detector, which operates on a set of carefully designed hand-crafted features. The fully convolutional architecture allows for the localization of EEG waveforms and patterns that result in high seizure probabilities for further clinical examination.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"27 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87000595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168160
Kimmo Suotsalo, S. Särkkä
Ventricular tachycardia, ventricular flutter, and ventricular fibrillation are malignant forms of cardiac arrhythmias, whose occurrence may be a life-threatening event. Several methods exist for detecting these arrhythmias in the electrocardiogram. However, the use of Gaussian process classifiers in this context has not been reported in the current literature. In comparison to the popular support vector machines, Gaussian processes have the advantage of being fully probabilistic, they can be re-casted in Bayesian filtering compatible state-space form, and they can be flexibly combined with first-principles physical models. In this paper we use Gaussian process classification to detect malignant ventricular arrhythmias in the electrocardiogram. We describe how Gaussian process classifiers can be used to solve the detection problem, and show that the proposed classifiers achieve a performance that is comparable to that of the state-of-the-art methods henceforth laying down promising foundations for more general electrocardiogram-based arrhythmia detection framework.
{"title":"Detecting malignant ventricular arrhythmias in electrocardiograms by Gaussian process classification","authors":"Kimmo Suotsalo, S. Särkkä","doi":"10.1109/MLSP.2017.8168160","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168160","url":null,"abstract":"Ventricular tachycardia, ventricular flutter, and ventricular fibrillation are malignant forms of cardiac arrhythmias, whose occurrence may be a life-threatening event. Several methods exist for detecting these arrhythmias in the electrocardiogram. However, the use of Gaussian process classifiers in this context has not been reported in the current literature. In comparison to the popular support vector machines, Gaussian processes have the advantage of being fully probabilistic, they can be re-casted in Bayesian filtering compatible state-space form, and they can be flexibly combined with first-principles physical models. In this paper we use Gaussian process classification to detect malignant ventricular arrhythmias in the electrocardiogram. We describe how Gaussian process classifiers can be used to solve the detection problem, and show that the proposed classifiers achieve a performance that is comparable to that of the state-of-the-art methods henceforth laying down promising foundations for more general electrocardiogram-based arrhythmia detection framework.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"36 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91515808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168150
R. Jiang, S. Qi, Yuhui Du, Weizheng Yan, V. Calhoun, T. Jiang, J. Sui
Variation in several brain regions and neural parameters is associated with intelligence. In this study, we adopted functional connectivity (FC) based on Brainnetome-atlas to predict the intelligence quotient (IQ) scores quantitatively with a prediction framework incorporating advanced feature selection and regression methods. We compared prediction performance of five regression models and evaluated the effectiveness of feature selection. The best prediction performance was achieved by ReliefF+LASSO, by which correlations of r=0.72 and r=0.46 between prediction and true values were obtained for 174 female and 186 male subjects respectively in a leave-one-out-cross-validation, suggesting that for female subjects, a better prediction of IQ scores can be achieved using precise FCs. Further, weight analysis revealed the most predictive FCs and the relevant regions. Results support the hypothesis that intelligence is characterized by interaction between multiple brain regions, especially the parieto-frontal integration theory implicated areas. This study facilitates our understanding of the biological basis of intelligence by individualized prediction.
{"title":"Predicting individualized intelligence quotient scores using brainnetome-atlas based functional connectivity","authors":"R. Jiang, S. Qi, Yuhui Du, Weizheng Yan, V. Calhoun, T. Jiang, J. Sui","doi":"10.1109/MLSP.2017.8168150","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168150","url":null,"abstract":"Variation in several brain regions and neural parameters is associated with intelligence. In this study, we adopted functional connectivity (FC) based on Brainnetome-atlas to predict the intelligence quotient (IQ) scores quantitatively with a prediction framework incorporating advanced feature selection and regression methods. We compared prediction performance of five regression models and evaluated the effectiveness of feature selection. The best prediction performance was achieved by ReliefF+LASSO, by which correlations of r=0.72 and r=0.46 between prediction and true values were obtained for 174 female and 186 male subjects respectively in a leave-one-out-cross-validation, suggesting that for female subjects, a better prediction of IQ scores can be achieved using precise FCs. Further, weight analysis revealed the most predictive FCs and the relevant regions. Results support the hypothesis that intelligence is characterized by interaction between multiple brain regions, especially the parieto-frontal integration theory implicated areas. This study facilitates our understanding of the biological basis of intelligence by individualized prediction.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"60 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86376970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168110
Shuang Gao, E. Osuch, M. Wammes, J. Théberge, T. Jiang, V. Calhoun, J. Sui
Bipolar disorder (BD) and major depressive disorder (MDD) both share depressive symptoms, so how to discriminate them in early depressive episodes is a major clinical challenge. Independent components (ICs) extracted from fMRI data have been proved to carry distinguishing information and can be used for classification. Here we extend a previous method that makes use of multiple fMRI ICs to build linear subspaces for each individual, which is further used as input for classifiers. The similarity matrix between different subjects is first calculated using distance metric of principal angle, which is then projected into kernel space for support vector machine (SVM) classification among 37 BDs and 36 MDDs. In practice, we adopt forward selection technique on 20 ICs and nested 10-fold cross validation to select the most discriminative IC combinations of fMRI and determine the final diagnosis by majority voting mechanism. The results on human data demonstrate that the proposed method achieves much better performance than its initial version [8] (93% vs. 75%), and identifies 5 discriminative fMRI components for distinguishing BD and MDD patients, which are mainly located in prefrontal cortex, default mode network and thalamus etc. This work provides a new framework for helping diagnose the new patients with overlapped symptoms between BD and MDD, which not only adds to our understanding of functional deficits in mood disorders, but also may serve as potential biomarkers for their differential diagnosis.
双相情感障碍(BD)和重度抑郁障碍(MDD)都具有抑郁症状,因此如何在早期抑郁发作中区分它们是一个重大的临床挑战。从功能磁共振成像数据中提取的独立分量(Independent components, ic)已被证明可以携带识别信息,并可用于分类。在这里,我们扩展了先前的方法,该方法使用多个fMRI ic为每个个体构建线性子空间,这进一步用作分类器的输入。首先利用主角距离度量计算不同受试者之间的相似矩阵,然后将其投影到核空间中,用于支持向量机(SVM)对37个bd和36个mdd进行分类。在实践中,我们对20个IC采用前向选择技术和嵌套10倍交叉验证,选择最具判别性的fMRI IC组合,并通过多数投票机制确定最终诊断。基于人体数据的实验结果表明,该方法的识别性能明显优于初始版本[8](93% vs. 75%),并识别出5个区分BD和MDD患者的fMRI成分,这些成分主要位于前额皮质、默认模式网络和丘脑等。本研究为帮助诊断双相障碍和重度抑郁症重叠症状的新患者提供了一个新的框架,不仅增加了我们对情绪障碍的功能缺陷的理解,而且可能作为鉴别诊断的潜在生物标志物。
{"title":"Discriminating bipolar disorder from major depression based on kernel SVM using functional independent components","authors":"Shuang Gao, E. Osuch, M. Wammes, J. Théberge, T. Jiang, V. Calhoun, J. Sui","doi":"10.1109/MLSP.2017.8168110","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168110","url":null,"abstract":"Bipolar disorder (BD) and major depressive disorder (MDD) both share depressive symptoms, so how to discriminate them in early depressive episodes is a major clinical challenge. Independent components (ICs) extracted from fMRI data have been proved to carry distinguishing information and can be used for classification. Here we extend a previous method that makes use of multiple fMRI ICs to build linear subspaces for each individual, which is further used as input for classifiers. The similarity matrix between different subjects is first calculated using distance metric of principal angle, which is then projected into kernel space for support vector machine (SVM) classification among 37 BDs and 36 MDDs. In practice, we adopt forward selection technique on 20 ICs and nested 10-fold cross validation to select the most discriminative IC combinations of fMRI and determine the final diagnosis by majority voting mechanism. The results on human data demonstrate that the proposed method achieves much better performance than its initial version [8] (93% vs. 75%), and identifies 5 discriminative fMRI components for distinguishing BD and MDD patients, which are mainly located in prefrontal cortex, default mode network and thalamus etc. This work provides a new framework for helping diagnose the new patients with overlapped symptoms between BD and MDD, which not only adds to our understanding of functional deficits in mood disorders, but also may serve as potential biomarkers for their differential diagnosis.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"9 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87577059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168116
Li Chai, Jun Du, Yannan Wang
Recently, the minimum mean squared error (MMSE) has been a benchmark of optimization criterion for deep neural network (DNN) based speech enhancement. In this study, a probabilistic learning framework to estimate the DNN parameters for single-channel speech enhancement is proposed. First, the statistical analysis shows that the prediction error vector at the DNN output well follows a unimodal density for each log-power spectral component. Accordingly, we present a maximum likelihood (ML) approach to DNN parameter learning by charactering the prediction error vector as a multivariate Gaussian density with a zero mean vector and an unknown covariance matrix. It is demonstrated that the proposed learning approach can achieve a better generalization capability than MMSE-based DNN learning for unseen noise types, which can significantly reduce the speech distortions in low SNR environments.
{"title":"Gaussian density guided deep neural network for single-channel speech enhancement","authors":"Li Chai, Jun Du, Yannan Wang","doi":"10.1109/MLSP.2017.8168116","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168116","url":null,"abstract":"Recently, the minimum mean squared error (MMSE) has been a benchmark of optimization criterion for deep neural network (DNN) based speech enhancement. In this study, a probabilistic learning framework to estimate the DNN parameters for single-channel speech enhancement is proposed. First, the statistical analysis shows that the prediction error vector at the DNN output well follows a unimodal density for each log-power spectral component. Accordingly, we present a maximum likelihood (ML) approach to DNN parameter learning by charactering the prediction error vector as a multivariate Gaussian density with a zero mean vector and an unknown covariance matrix. It is demonstrated that the proposed learning approach can achieve a better generalization capability than MMSE-based DNN learning for unseen noise types, which can significantly reduce the speech distortions in low SNR environments.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"24 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82227098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168161
P. Rodríguez
We present a simple and computationally efficient algorithm, based on the accelerated Newton's method, to solve the root finding problem associated with the projection onto the ℓ1-ball problem. Considering an interpretation of the Michelot's algorithm as Newton method, our algorithm can be understood as an accelerated version of the Michelot's algorithm, that needs significantly less major iterations to converge to the solution. Although the worst-case performance of the propose algorithm is O(n2), it exhibits in practice an O(n) performance and it is empirically demonstrated that it is competitive or faster than existing methods.
{"title":"An accelerated newton's method for projections onto the ℓ1-ball","authors":"P. Rodríguez","doi":"10.1109/MLSP.2017.8168161","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168161","url":null,"abstract":"We present a simple and computationally efficient algorithm, based on the accelerated Newton's method, to solve the root finding problem associated with the projection onto the ℓ1-ball problem. Considering an interpretation of the Michelot's algorithm as Newton method, our algorithm can be understood as an accelerated version of the Michelot's algorithm, that needs significantly less major iterations to converge to the solution. Although the worst-case performance of the propose algorithm is O(n2), it exhibits in practice an O(n) performance and it is empirically demonstrated that it is competitive or faster than existing methods.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83513320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}