Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168131
Cuong D. Tran, Ognjen Rudovic, V. Pavlovic
We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predictive densities beyond the common exponential family; (b) we show how to leverage Sklar's theorem, the essence of the copula formulation relating the joint density to the copula dependency functions, to find effective feature mappings that mitigate the domain mismatch. By transforming the data to a copula domain, we show on a number of benchmark datasets (including human emotion estimation), and using different regression models for prediction, that we can achieve a more robust and accurate estimation of target labels, compared to recently proposed feature transformation (adaptation) methods.
{"title":"Unsupervised domain adaptation with copula models","authors":"Cuong D. Tran, Ognjen Rudovic, V. Pavlovic","doi":"10.1109/MLSP.2017.8168131","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168131","url":null,"abstract":"We study the task of unsupervised domain adaptation, where no labeled data from the target domain is provided during training time. To deal with the potential discrepancy between the source and target distributions, both in features and labels, we exploit a copula-based regression framework. The benefits of this approach are two-fold: (a) it allows us to model a broader range of conditional predictive densities beyond the common exponential family; (b) we show how to leverage Sklar's theorem, the essence of the copula formulation relating the joint density to the copula dependency functions, to find effective feature mappings that mitigate the domain mismatch. By transforming the data to a copula domain, we show on a number of benchmark datasets (including human emotion estimation), and using different regression models for prediction, that we can achieve a more robust and accurate estimation of target labels, compared to recently proposed feature transformation (adaptation) methods.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"64 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76760070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168172
Khalil Elkhalil, A. Kammoun, Romain Couillet, T. Al-Naffouri, Mohamed-Slim Alouini
This paper carries out a large dimensional analysis of the standard regularized quadratic discriminant analysis (QDA) classifier designed on the assumption that data arise from a Gaussian mixture model. The analysis relies on fundamental results from random matrix theory (RMT) when both the number of features and the cardinality of the training data within each class grow large at the same pace. Under some mild assumptions, we show that the asymptotic classification error converges to a deterministic quantity that depends only on the covariances and means associated with each class as well as the problem dimensions. Such a result permits a better understanding of the performance of regularized QDA and can be used to determine the optimal regularization parameter that minimizes the misclassification error probability. Despite being valid only for Gaussian data, our theoretical findings are shown to yield a high accuracy in predicting the performances achieved with real data sets drawn from popular real data bases, thereby making an interesting connection between theory and practice.
{"title":"Asymptotic performance of regularized quadratic discriminant analysis based classifiers","authors":"Khalil Elkhalil, A. Kammoun, Romain Couillet, T. Al-Naffouri, Mohamed-Slim Alouini","doi":"10.1109/MLSP.2017.8168172","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168172","url":null,"abstract":"This paper carries out a large dimensional analysis of the standard regularized quadratic discriminant analysis (QDA) classifier designed on the assumption that data arise from a Gaussian mixture model. The analysis relies on fundamental results from random matrix theory (RMT) when both the number of features and the cardinality of the training data within each class grow large at the same pace. Under some mild assumptions, we show that the asymptotic classification error converges to a deterministic quantity that depends only on the covariances and means associated with each class as well as the problem dimensions. Such a result permits a better understanding of the performance of regularized QDA and can be used to determine the optimal regularization parameter that minimizes the misclassification error probability. Despite being valid only for Gaussian data, our theoretical findings are shown to yield a high accuracy in predicting the performances achieved with real data sets drawn from popular real data bases, thereby making an interesting connection between theory and practice.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79244806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168128
Suman Kumar Choudhury, R. P. Padhy, P. K. Sa
This paper presents a fully convolutional architecture for pedestrian detection. The DenseNet model is incorporated in the Faster R-CNN framework to extract the deep convolutional features. A two-phase approach is suggested to minimize the false positives owing to hard negative backgrounds. Feature maps from multiple intermediate layers are taken into consideration to facilitate small-scale detection. The proposed method alongside few competent schemes are compared on two benchmark datasets. The obtained results demonstrate the potential of our approach in addressing the real world challenges.
{"title":"Faster R-CNN with densenet for scale aware pedestrian detection vis-à-vis hard negative suppression","authors":"Suman Kumar Choudhury, R. P. Padhy, P. K. Sa","doi":"10.1109/MLSP.2017.8168128","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168128","url":null,"abstract":"This paper presents a fully convolutional architecture for pedestrian detection. The DenseNet model is incorporated in the Faster R-CNN framework to extract the deep convolutional features. A two-phase approach is suggested to minimize the false positives owing to hard negative backgrounds. Feature maps from multiple intermediate layers are taken into consideration to facilitate small-scale detection. The proposed method alongside few competent schemes are compared on two benchmark datasets. The obtained results demonstrate the potential of our approach in addressing the real world challenges.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"88 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80273796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168112
Tahar Nabil, F. Roueff, J. Jicquel, A. Girard
This article deals with the identification of a dynamic building model from on-site input-output records. In practice, the solar gains, a key input, are often unobserved due to the cost of the associated sensor. We suggest here to replace this sensor by a cheap outdoor temperature sensor, exposed to the sun. Our assumption is that the temperature bias between this sensor and a second sheltered sensor is an indirect observation of the solar flux. We derive a novel state-space model for the outdoor temperature bias, with sudden changes in the weather conditions accounted for by occasional high variance increments of the hidden state. The magnitude of the high values and the times at which they occur are estimated with an ℓ1-regularized maximum likelihood approach. Finally, this model is appended to a thermal building model based on an equivalent RC network, forming a conditionally linear Gaussian state-space system. We apply the Expectation-Maximization algorithm with Rao-Blackwellised particle smoothing in order to learn the thermal model. We are able, despite the indirect observation of the solar flux, to correctly estimate the physical parameters of the building, in particular the static coefficients and the fast time constant.
{"title":"Identification of a thermal building model by learning the dynamics of the solar flux","authors":"Tahar Nabil, F. Roueff, J. Jicquel, A. Girard","doi":"10.1109/MLSP.2017.8168112","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168112","url":null,"abstract":"This article deals with the identification of a dynamic building model from on-site input-output records. In practice, the solar gains, a key input, are often unobserved due to the cost of the associated sensor. We suggest here to replace this sensor by a cheap outdoor temperature sensor, exposed to the sun. Our assumption is that the temperature bias between this sensor and a second sheltered sensor is an indirect observation of the solar flux. We derive a novel state-space model for the outdoor temperature bias, with sudden changes in the weather conditions accounted for by occasional high variance increments of the hidden state. The magnitude of the high values and the times at which they occur are estimated with an ℓ1-regularized maximum likelihood approach. Finally, this model is appended to a thermal building model based on an equivalent RC network, forming a conditionally linear Gaussian state-space system. We apply the Expectation-Maximization algorithm with Rao-Blackwellised particle smoothing in order to learn the thermal model. We are able, despite the indirect observation of the solar flux, to correctly estimate the physical parameters of the building, in particular the static coefficients and the fast time constant.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"16 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81021467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168164
Y. Kawaguchi, Takashi Endo
We aim to reduce the cost of sound monitoring for maintain machinery by reducing the sampling rate, i.e., sub-Nyquist sampling. Monitoring based on sub-Nyquist sampling requires two sub-systems: a sub-system on-site for sampling machinery sounds at a low rate and a sub-system off-site for detecting anomalies from the subsampled signal. This paper proposes a method for achieving both subsystems. First, the proposed method uses non-uniform sampling to encode higher than the Nyquist frequency. Second, the method applies a long short-term memory-(LSTM)-based autoencoder network for detecting anomalies. The novelty of the proposed network is that the subsampled time-domain signal is demultiplexed and received as input in an end-to-end manner, enabling anomaly detection from the subsampled signal. Experimental results indicate that our method is suitable for anomaly detection from the subsampled signal.
{"title":"How can we detect anomalies from subsampled audio signals?","authors":"Y. Kawaguchi, Takashi Endo","doi":"10.1109/MLSP.2017.8168164","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168164","url":null,"abstract":"We aim to reduce the cost of sound monitoring for maintain machinery by reducing the sampling rate, i.e., sub-Nyquist sampling. Monitoring based on sub-Nyquist sampling requires two sub-systems: a sub-system on-site for sampling machinery sounds at a low rate and a sub-system off-site for detecting anomalies from the subsampled signal. This paper proposes a method for achieving both subsystems. First, the proposed method uses non-uniform sampling to encode higher than the Nyquist frequency. Second, the method applies a long short-term memory-(LSTM)-based autoencoder network for detecting anomalies. The novelty of the proposed network is that the subsampled time-domain signal is demultiplexed and received as input in an end-to-end manner, enabling anomaly detection from the subsampled signal. Experimental results indicate that our method is suitable for anomaly detection from the subsampled signal.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"191 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77755064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168170
Rintaro Ikeshita, M. Togami, Y. Kawaguchi, Yusuke Fujita, Kenji Nagamatsu
To improve the performance of blind audio source separation of convolutive mixtures, the local Gaussian model (LGM) having full rank covariance matrices proposed by Duong et al. is extended. The previous model basically assumes that all sources contribute to each time-frequency slot, which may fail to capture the characteristic of signals with many intermittent silent periods. A constraint on source sets that contribute to each time-frequency slot is therefore explicitly introduced. This approach can be regarded as a relaxation of the sparsity constraint in the conventional time-frequency mask. The proposed model is jointly optimized among the original local Gaussian model parameters, the relaxed version of the time-frequency mask, and a permutation alignment, leading to a robust permutation-free algorithm. We also present a novel multi-channel Wiener filter weighted by a relaxed version of the time-frequency mask. Experimental results over noisy speech signals show that the proposed model is effective compared with the original local Gaussian model and is comparable to its extension, the multi-channel nonnegative matrix factorization.
{"title":"Local Gaussian model with source-set constraints in audio source separation","authors":"Rintaro Ikeshita, M. Togami, Y. Kawaguchi, Yusuke Fujita, Kenji Nagamatsu","doi":"10.1109/MLSP.2017.8168170","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168170","url":null,"abstract":"To improve the performance of blind audio source separation of convolutive mixtures, the local Gaussian model (LGM) having full rank covariance matrices proposed by Duong et al. is extended. The previous model basically assumes that all sources contribute to each time-frequency slot, which may fail to capture the characteristic of signals with many intermittent silent periods. A constraint on source sets that contribute to each time-frequency slot is therefore explicitly introduced. This approach can be regarded as a relaxation of the sparsity constraint in the conventional time-frequency mask. The proposed model is jointly optimized among the original local Gaussian model parameters, the relaxed version of the time-frequency mask, and a permutation alignment, leading to a robust permutation-free algorithm. We also present a novel multi-channel Wiener filter weighted by a relaxed version of the time-frequency mask. Experimental results over noisy speech signals show that the proposed model is effective compared with the original local Gaussian model and is comparable to its extension, the multi-channel nonnegative matrix factorization.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"155 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86299522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168143
J. Simm, Adam Arany, Pooya Zakeri, Tom Haber, J. Wegner, V. Chupakhin, H. Ceulemans, Y. Moreau
Bayesian matrix factorization is a method of choice for making predictions for large-scale incomplete matrices, due to availability of efficient Gibbs sampling schemes and its robustness to overfitting. In this paper, we consider factorization of large scale matrices with high-dimensional side information. However, sampling the link matrix for the side information with standard approaches costs O(F3) time, where F is the dimensionality of the features. To overcome this limitation we, firstly, propose a prior for the link matrix whose strength is proportional to the scale of latent variables. Secondly, using this prior we derive an efficient sampler, with linear complexity in the number of non-zeros, O(Nnz), by leveraging Krylov subspace methods, such as block conjugate gradient, allowing us to handle million-dimensional side information. We demonstrate the effectiveness of our proposed method in drug-protein interaction prediction task.
{"title":"Macau: Scalable Bayesian factorization with high-dimensional side information using MCMC","authors":"J. Simm, Adam Arany, Pooya Zakeri, Tom Haber, J. Wegner, V. Chupakhin, H. Ceulemans, Y. Moreau","doi":"10.1109/MLSP.2017.8168143","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168143","url":null,"abstract":"Bayesian matrix factorization is a method of choice for making predictions for large-scale incomplete matrices, due to availability of efficient Gibbs sampling schemes and its robustness to overfitting. In this paper, we consider factorization of large scale matrices with high-dimensional side information. However, sampling the link matrix for the side information with standard approaches costs O(F3) time, where F is the dimensionality of the features. To overcome this limitation we, firstly, propose a prior for the link matrix whose strength is proportional to the scale of latent variables. Secondly, using this prior we derive an efficient sampler, with linear complexity in the number of non-zeros, O(Nnz), by leveraging Krylov subspace methods, such as block conjugate gradient, allowing us to handle million-dimensional side information. We demonstrate the effectiveness of our proposed method in drug-protein interaction prediction task.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86624646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168117
S. Mimilakis, K. Drossos, T. Virtanen, G. Schuller
The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral representations are then used to derive time-frequency masks. In this work we introduce a method to directly learn time-frequency masks from an observed mixture magnitude spectrum. We employ recurrent neural networks and train them using prior knowledge only for the magnitude spectrum of the target source. To assess the performance of the proposed method, we focus on the task of singing voice separation. The results from an objective evaluation show that our proposed method provides comparable results to deep learning based methods which operate over complicated signal representations. Compared to previous methods that approximate time-frequency masks, our method has increased performance of signal to distortion ratio by an average of 3.8 dB.
{"title":"A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation","authors":"S. Mimilakis, K. Drossos, T. Virtanen, G. Schuller","doi":"10.1109/MLSP.2017.8168117","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168117","url":null,"abstract":"The objective of deep learning methods based on encoder-decoder architectures for music source separation is to approximate either ideal time-frequency masks or spectral representations of the target music source(s). The spectral representations are then used to derive time-frequency masks. In this work we introduce a method to directly learn time-frequency masks from an observed mixture magnitude spectrum. We employ recurrent neural networks and train them using prior knowledge only for the magnitude spectrum of the target source. To assess the performance of the proposed method, we focus on the task of singing voice separation. The results from an objective evaluation show that our proposed method provides comparable results to deep learning based methods which operate over complicated signal representations. Compared to previous methods that approximate time-frequency masks, our method has increased performance of signal to distortion ratio by an average of 3.8 dB.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82502899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168171
R. Hostettler, S. Särkkä, S. Godsill
In this paper, we consider parameter estimation in latent, spatiotemporal Gaussian processes using particle Markov chain Monte Carlo methods. In particular, we use spectral decomposition of the covariance function to obtain a high-dimensional state-space representation of the Gaussian processes, which is assumed to be observed through a nonlinear non-Gaussian likelihood. We develop a Rao-Blackwellized particle Gibbs sampler to sample the state trajectory and show how to sample the hyperparameters and possible parameters in the likelihood. The proposed method is evaluated on a spatio-temporal population model and the predictive performance is evaluated using leave-one-out cross-validation.
{"title":"Rao-Blackwellized particle mcmc for parameter estimation in spatio-temporal Gaussian processes","authors":"R. Hostettler, S. Särkkä, S. Godsill","doi":"10.1109/MLSP.2017.8168171","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168171","url":null,"abstract":"In this paper, we consider parameter estimation in latent, spatiotemporal Gaussian processes using particle Markov chain Monte Carlo methods. In particular, we use spectral decomposition of the covariance function to obtain a high-dimensional state-space representation of the Gaussian processes, which is assumed to be observed through a nonlinear non-Gaussian likelihood. We develop a Rao-Blackwellized particle Gibbs sampler to sample the state trajectory and show how to sample the hyperparameters and possible parameters in the likelihood. The proposed method is evaluated on a spatio-temporal population model and the predictive performance is evaluated using leave-one-out cross-validation.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"44 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79348195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168139
Victor Bisot, R. Serizel, S. Essid, G. Richard
This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
{"title":"Leveraging deep neural networks with nonnegative representations for improved environmental sound classification","authors":"Victor Bisot, R. Serizel, S. Essid, G. Richard","doi":"10.1109/MLSP.2017.8168139","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168139","url":null,"abstract":"This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks, whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neural networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"67 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88044382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}