Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168173
F. Huang, P. Balázs
This paper presents an automatic approach for parameter training for a sparsity-based pitch estimation method that has been previously published. For this pitch estimation method, the harmonic dictionary is a key parameter that needs to be carefully prepared beforehand. In the original method, extensive human supervision and involvement are required to construct and label the dictionary. In this study, we propose to employ dictionary learning algorithms to learn the dictionary directly from training data. We apply and compare 3 typical dictionary learning algorithms, i.e., the method of optimized directions (MOD), K-SVD and online dictionary learning (ODL), and propose a post-processing method to label and adapt a learned dictionary for pitch estimation. Results show that MOD and properly initialized ODL (pi-ODL) can lead to dictionaries that exhibit the desired harmonic structures for pitch estimation, and the post-processing method can significantly improve performance of the learned dictionaries in pitch estimation. The dictionary obtained with pi-ODL and post-processing attained pitch estimation accuracy close to the optimal performance of the manual dictionary. It is positively shown that dictionary learning is feasible and promising for this application.
{"title":"Dictionary learning for pitch estimation in speech signals","authors":"F. Huang, P. Balázs","doi":"10.1109/MLSP.2017.8168173","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168173","url":null,"abstract":"This paper presents an automatic approach for parameter training for a sparsity-based pitch estimation method that has been previously published. For this pitch estimation method, the harmonic dictionary is a key parameter that needs to be carefully prepared beforehand. In the original method, extensive human supervision and involvement are required to construct and label the dictionary. In this study, we propose to employ dictionary learning algorithms to learn the dictionary directly from training data. We apply and compare 3 typical dictionary learning algorithms, i.e., the method of optimized directions (MOD), K-SVD and online dictionary learning (ODL), and propose a post-processing method to label and adapt a learned dictionary for pitch estimation. Results show that MOD and properly initialized ODL (pi-ODL) can lead to dictionaries that exhibit the desired harmonic structures for pitch estimation, and the post-processing method can significantly improve performance of the learned dictionaries in pitch estimation. The dictionary obtained with pi-ODL and post-processing attained pitch estimation accuracy close to the optimal performance of the manual dictionary. It is positively shown that dictionary learning is feasible and promising for this application.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"49 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88947433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168159
Rasmus Bonnevie, Mikkel N. Schmidt, Morten Mørup
Variational methods for approximate inference in Bayesian models optimise a lower bound on the marginal likelihood, but the optimization problem often suffers from being nonconvex and high-dimensional. This can be alleviated by working in a collapsed domain where a part of the parameter space is marginalized. We consider the KL-corrected collapsed variational bound and apply it to Dirichlet process mixture models, allowing us to reduce the optimization space considerably. We find that the variational bound exhibits consistent and exploitable structure, allowing the application of difference-of-convex optimization algorithms. We show how this yields an interpretable fixed-point update algorithm in the collapsed setting for the Dirichlet process mixture model. We connect this update formula to classical coordinate ascent updates, illustrating that the proposed improvement surprisingly reduces to the traditional scheme.
{"title":"Difference-of-Convex optimization for variational kl-corrected inference in dirichlet process mixtures","authors":"Rasmus Bonnevie, Mikkel N. Schmidt, Morten Mørup","doi":"10.1109/MLSP.2017.8168159","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168159","url":null,"abstract":"Variational methods for approximate inference in Bayesian models optimise a lower bound on the marginal likelihood, but the optimization problem often suffers from being nonconvex and high-dimensional. This can be alleviated by working in a collapsed domain where a part of the parameter space is marginalized. We consider the KL-corrected collapsed variational bound and apply it to Dirichlet process mixture models, allowing us to reduce the optimization space considerably. We find that the variational bound exhibits consistent and exploitable structure, allowing the application of difference-of-convex optimization algorithms. We show how this yields an interpretable fixed-point update algorithm in the collapsed setting for the Dirichlet process mixture model. We connect this update formula to classical coordinate ascent updates, illustrating that the proposed improvement surprisingly reduces to the traditional scheme.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85756170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a statistical method of audio source separation based on a nonparametric Bayesian extension of probabilistic latent component analysis (PLCA). A major approach to audio source separation is to use nonnegative matrix factorization (NMF) that approximates the magnitude spectrum of a mixture signal at each frame as the weighted sum of fewer source spectra. Another approach is to use PLCA that regards the magnitude spectrogram as a two-dimensional histogram of “sound quanta” and classifies each quantum into one of sources. While NMF has a physically-natural interpretation, PLCA has been used successfully for music signal analysis. To enable PLCA to estimate the number of sources, we propose Dirichlet process PLCA (DP-PLCA) and derive two kinds of learning methods based on variational Bayes and collapsed Gibbs sampling. Unlike existing learning methods for nonparametric Bayesian NMF based on the beta or gamma processes (BP-NMF and GaP-NMF), our sampling method can efficiently search for the optimal number of sources without truncating the number of sources to be considered. Experimental results showed that DP-PLCA is superior to GaP-NMF in terms of source number estimation.
{"title":"Infinite probabilistic latent component analysis for audio source separation","authors":"Kazuyoshi Yoshii, Eita Nakamura, Katsutoshi Itoyama, Masataka Goto","doi":"10.1109/MLSP.2017.8168189","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168189","url":null,"abstract":"This paper presents a statistical method of audio source separation based on a nonparametric Bayesian extension of probabilistic latent component analysis (PLCA). A major approach to audio source separation is to use nonnegative matrix factorization (NMF) that approximates the magnitude spectrum of a mixture signal at each frame as the weighted sum of fewer source spectra. Another approach is to use PLCA that regards the magnitude spectrogram as a two-dimensional histogram of “sound quanta” and classifies each quantum into one of sources. While NMF has a physically-natural interpretation, PLCA has been used successfully for music signal analysis. To enable PLCA to estimate the number of sources, we propose Dirichlet process PLCA (DP-PLCA) and derive two kinds of learning methods based on variational Bayes and collapsed Gibbs sampling. Unlike existing learning methods for nonparametric Bayesian NMF based on the beta or gamma processes (BP-NMF and GaP-NMF), our sampling method can efficiently search for the optimal number of sources without truncating the number of sources to be considered. Experimental results showed that DP-PLCA is superior to GaP-NMF in terms of source number estimation.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"15 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87005228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168168
J. Stuchi, M. A. Angeloni, R. F. Pereira, L. Boccato, G. Folego, Paulo V. S. Prado, R. Attux
Machine learning has been increasingly used in current days. Great improvements, especially in deep neural networks, helped to boost the achievable performance in computer vision and signal processing applications. Although different techniques were applied for deep architectures, the frequency domain has not been thoroughly explored in this field. In this context, this paper presents a new method for extracting discriminative features according to the Fourier analysis. The proposed frequency extractor layer can be combined with deep architectures in order to improve image classification. Computational experiments were performed on face liveness detection problem, yielding better results than those presented in the literature for the grandtest protocol of Replay-Attack Database. This paper also aims to raise the discussion on how frequency domain layers can be used in deep architectures to further improve the network performance.
{"title":"Improving image classification with frequency domain layers for feature extraction","authors":"J. Stuchi, M. A. Angeloni, R. F. Pereira, L. Boccato, G. Folego, Paulo V. S. Prado, R. Attux","doi":"10.1109/MLSP.2017.8168168","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168168","url":null,"abstract":"Machine learning has been increasingly used in current days. Great improvements, especially in deep neural networks, helped to boost the achievable performance in computer vision and signal processing applications. Although different techniques were applied for deep architectures, the frequency domain has not been thoroughly explored in this field. In this context, this paper presents a new method for extracting discriminative features according to the Fourier analysis. The proposed frequency extractor layer can be combined with deep architectures in order to improve image classification. Computational experiments were performed on face liveness detection problem, yielding better results than those presented in the literature for the grandtest protocol of Replay-Attack Database. This paper also aims to raise the discussion on how frequency domain layers can be used in deep architectures to further improve the network performance.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90542996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168111
Subhajit Chaudhury, Sakyasingha Dasgupta, Asim Munawar, Md. A. Salam Khan, Ryuki Tachibana
We present a conditional generative method that maps low-dimensional embeddings of image and natural language to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. Based on this, we present a method to learn the conditional probability distribution of the two embedding spaces; first, by mapping them to a shared latent space and generating back the individual embeddings from this common space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick — wherein, the single shared latent space is replaced by two separate latent spaces. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the Euclidean distance between their distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors, thereby enabling the generation of specific colored images from the respective text data.
{"title":"Text to image generative model using constrained embedding space mapping","authors":"Subhajit Chaudhury, Sakyasingha Dasgupta, Asim Munawar, Md. A. Salam Khan, Ryuki Tachibana","doi":"10.1109/MLSP.2017.8168111","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168111","url":null,"abstract":"We present a conditional generative method that maps low-dimensional embeddings of image and natural language to a common latent space hence extracting semantic relationships between them. The embedding specific to a modality is first extracted and subsequently a constrained optimization procedure is performed to project the two embedding spaces to a common manifold. Based on this, we present a method to learn the conditional probability distribution of the two embedding spaces; first, by mapping them to a shared latent space and generating back the individual embeddings from this common space. However, in order to enable independent conditional inference for separately extracting the corresponding embeddings from the common latent space representation, we deploy a proxy variable trick — wherein, the single shared latent space is replaced by two separate latent spaces. We design an objective function, such that, during training we can force these separate spaces to lie close to each other, by minimizing the Euclidean distance between their distribution functions. Experimental results demonstrate that the learned joint model can generalize to learning concepts of double MNIST digits with additional attributes of colors, thereby enabling the generation of specific colored images from the respective text data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"62 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85222872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168107
Tarn Nguyen, R. Raich, Xiaoli Z. Fern, Anh T. Pham
Manual labeling of individual instances is time-consuming. This is commonly resolved by labeling a bag-of-instances with a single common label or label-set. However, this approach is still time-costly for large datasets. In this paper, we propose a mixed-supervision multi-instance multi-label learning model for learning from easily available meta data information (MIML-AI). This auxiliary information is normally collected automatically with the data, e.g., an image location information or a document author name. We propose a discriminative graphical model with exact inferences to train a classifier based on auxiliary label information and a small number of labeled bags. This strategy utilizes meta data as means of providing a weaker label as an alternative to intensive manual labeling. Experiment on real data illustrates the effectiveness of our proposed method relative to current approaches, which do not use the information from bags that contain only meta-data label information.
{"title":"MIML-AI: Mixed-supervision multi-instance multi-label learning with auxiliary information","authors":"Tarn Nguyen, R. Raich, Xiaoli Z. Fern, Anh T. Pham","doi":"10.1109/MLSP.2017.8168107","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168107","url":null,"abstract":"Manual labeling of individual instances is time-consuming. This is commonly resolved by labeling a bag-of-instances with a single common label or label-set. However, this approach is still time-costly for large datasets. In this paper, we propose a mixed-supervision multi-instance multi-label learning model for learning from easily available meta data information (MIML-AI). This auxiliary information is normally collected automatically with the data, e.g., an image location information or a document author name. We propose a discriminative graphical model with exact inferences to train a classifier based on auxiliary label information and a small number of labeled bags. This strategy utilizes meta data as means of providing a weaker label as an alternative to intensive manual labeling. Experiment on real data illustrates the effectiveness of our proposed method relative to current approaches, which do not use the information from bags that contain only meta-data label information.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"146 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78590850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168179
Weizheng Yan, S. Plis, V. Calhoun, Shengfeng Liu, R. Jiang, T. Jiang, J. Sui
Deep learning has gained considerable attention in the scientific community, breaking benchmark records in many fields such as speech and visual recognition [1]. Motivated by extending advancement of deep learning approaches to brain imaging classification, we propose a framework, called “deep neural network (DNN)+ layer-wise relevance propagation (LRP)”, to distinguish schizophrenia patients (SZ) from healthy controls (HCs) using functional network connectivity (FNC). 1100 Chinese subjects of 7 sites are included, each with a 50∗50 FNC matrix resulted from group ICA on resting-state fMRI data. The proposed DNN+LRP not only improves classification accuracy significantly compare to four state-of-the-art classification methods (84% vs. less than 79%, 10 folds cross validation) but also enables identification of the most contributing FNC patterns related to SZ classification, which cannot be easily traced back by general DNN models. By conducting LRP, we identified the FNC patterns that exhibit the highest discriminative power in SZ classification. More importantly, when using leave-one-site-out cross validation (using 6 sites for training, 1 site for testing, 7 times in total), the cross-site classification accuracy reached 82%, suggesting high robustness and generalization performance of the proposed method, promising a wide utility in the community and great potentials for biomarker identification of brain disorders.
{"title":"Discriminating schizophrenia from normal controls using resting state functional network connectivity: A deep neural network and layer-wise relevance propagation method","authors":"Weizheng Yan, S. Plis, V. Calhoun, Shengfeng Liu, R. Jiang, T. Jiang, J. Sui","doi":"10.1109/MLSP.2017.8168179","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168179","url":null,"abstract":"Deep learning has gained considerable attention in the scientific community, breaking benchmark records in many fields such as speech and visual recognition [1]. Motivated by extending advancement of deep learning approaches to brain imaging classification, we propose a framework, called “deep neural network (DNN)+ layer-wise relevance propagation (LRP)”, to distinguish schizophrenia patients (SZ) from healthy controls (HCs) using functional network connectivity (FNC). 1100 Chinese subjects of 7 sites are included, each with a 50∗50 FNC matrix resulted from group ICA on resting-state fMRI data. The proposed DNN+LRP not only improves classification accuracy significantly compare to four state-of-the-art classification methods (84% vs. less than 79%, 10 folds cross validation) but also enables identification of the most contributing FNC patterns related to SZ classification, which cannot be easily traced back by general DNN models. By conducting LRP, we identified the FNC patterns that exhibit the highest discriminative power in SZ classification. More importantly, when using leave-one-site-out cross validation (using 6 sites for training, 1 site for testing, 7 times in total), the cross-site classification accuracy reached 82%, suggesting high robustness and generalization performance of the proposed method, promising a wide utility in the community and great potentials for biomarker identification of brain disorders.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75381907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168162
David J. Miller, Najah F. Ghalyan, A. Ray
Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabet uniquely specifies the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. Previous work attemps to minimize a clustering objective function that measures discrepancy between a set of reconstruction values and the points from the time series. Unfortunately, the resulting algorithm is non-convergent, with no guarantee of finding even locally optimal solutions. The problem is a heuristic “nearest neighbor” symbol assignment step. Alternatively, we introduce a new, locally optimal algorithm. We apply iterative “nearest neighbor” symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the time series is achieved. While some approaches use vector quantization to partition the state space, our approach only ensures a partition in the space consisting of the entire time series (effectively, clustering in an infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. We demonstrate improvement, with respect to several measures, over a popular method used in the literature.
{"title":"A locally optimal algorithm for estimating a generating partition from an observed time series","authors":"David J. Miller, Najah F. Ghalyan, A. Ray","doi":"10.1109/MLSP.2017.8168162","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168162","url":null,"abstract":"Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabet uniquely specifies the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. Previous work attemps to minimize a clustering objective function that measures discrepancy between a set of reconstruction values and the points from the time series. Unfortunately, the resulting algorithm is non-convergent, with no guarantee of finding even locally optimal solutions. The problem is a heuristic “nearest neighbor” symbol assignment step. Alternatively, we introduce a new, locally optimal algorithm. We apply iterative “nearest neighbor” symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the time series is achieved. While some approaches use vector quantization to partition the state space, our approach only ensures a partition in the space consisting of the entire time series (effectively, clustering in an infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. We demonstrate improvement, with respect to several measures, over a popular method used in the literature.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"136 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89243951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168151
Maria Scalabrin, Matteo Gadaleta, Riccardo Bonetto, M. Rossi
In this paper, we are concerned with the automated and runtime analysis of vehicular data from large scale traffic monitoring networks. This problem is tackled through localized and small-size Bayesian networks (BNs), which are utilized to capture the spatio-temporal relationships underpinning traffic data from nearby road links. A dedicated BN is set up, trained, and tested for each road in the monitored geographical map. The joint probability distribution between the cause nodes and the effect node in the BN is tracked through a Gaussian Mixture Model (GMM), whose parameters are estimated via Bayesian Variational Inference (BVI). Forecasting and anomaly detection are performed on statistical measures derived at runtime by the trained GMMs. Our design choices lead to several advantages: the approach is scalable as a small-size BN is associated with and independently trained for each road and the localized nature of the framework allows flagging atypical behaviors at their point of origin in the monitored geographical map. The effectiveness of the proposed framework is tested using a large dataset from a real network deployment, comparing its prediction performance with that of selected regression algorithms from the literature, while also quantifying its anomaly detection capabilities.
{"title":"A Bayesian forecasting and anomaly detection framework for vehicular monitoring networks","authors":"Maria Scalabrin, Matteo Gadaleta, Riccardo Bonetto, M. Rossi","doi":"10.1109/MLSP.2017.8168151","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168151","url":null,"abstract":"In this paper, we are concerned with the automated and runtime analysis of vehicular data from large scale traffic monitoring networks. This problem is tackled through localized and small-size Bayesian networks (BNs), which are utilized to capture the spatio-temporal relationships underpinning traffic data from nearby road links. A dedicated BN is set up, trained, and tested for each road in the monitored geographical map. The joint probability distribution between the cause nodes and the effect node in the BN is tracked through a Gaussian Mixture Model (GMM), whose parameters are estimated via Bayesian Variational Inference (BVI). Forecasting and anomaly detection are performed on statistical measures derived at runtime by the trained GMMs. Our design choices lead to several advantages: the approach is scalable as a small-size BN is associated with and independently trained for each road and the localized nature of the framework allows flagging atypical behaviors at their point of origin in the monitored geographical map. The effectiveness of the proposed framework is tested using a large dataset from a real network deployment, comparing its prediction performance with that of selected regression algorithms from the literature, while also quantifying its anomaly detection capabilities.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"123 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83473001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a semi-blind speech enhancement method using a semi-blind recurrent neural network (SB-RNN) for human-robot speech interaction. When a robot interacts with a human using speech signals, the robot inputs not only audio signals recorded by its own microphone but also speech signals made by the robot itself, which can be used for semi-blind speech enhancement. The SB-RNN consists of cascaded two modules: a semi-blind source separation module and a blind dereverberation module. Each module has a recurrent layer to capture the temporal correlations of speech signals. The SB-RNN is trained in a manner of multi-task learning, i.e., isolated echoic speech signals are used as teacher signals for the output of the separation module in addition to isolated unechoic signals for the output of the dereverberation module. Experimental results showed that the source to distortion ratio was improved by 2.30 dB on average compared to a conventional method based on a semi-blind independent component analysis. The results also showed the effectiveness of modularization of the network, multi-task learning, the recurrent structure, and semi-blind source separation.
{"title":"Semi-Blind speech enhancement basedon recurrent neural network for source separation and dereverberation","authors":"Masaya Wake, Yoshiaki Bando, M. Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara","doi":"10.1109/MLSP.2017.8168191","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168191","url":null,"abstract":"This paper describes a semi-blind speech enhancement method using a semi-blind recurrent neural network (SB-RNN) for human-robot speech interaction. When a robot interacts with a human using speech signals, the robot inputs not only audio signals recorded by its own microphone but also speech signals made by the robot itself, which can be used for semi-blind speech enhancement. The SB-RNN consists of cascaded two modules: a semi-blind source separation module and a blind dereverberation module. Each module has a recurrent layer to capture the temporal correlations of speech signals. The SB-RNN is trained in a manner of multi-task learning, i.e., isolated echoic speech signals are used as teacher signals for the output of the separation module in addition to isolated unechoic signals for the output of the dereverberation module. Experimental results showed that the source to distortion ratio was improved by 2.30 dB on average compared to a conventional method based on a semi-blind independent component analysis. The results also showed the effectiveness of modularization of the network, multi-task learning, the recurrent structure, and semi-blind source separation.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"98 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76982569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}