Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168169
A. Liutkus, Kazuyoshi Yoshii
This paper presents an accelerated version of positive semidefinite tensor factorization (PSDTF) for blind source separation. PSDTF works better than nonnegative matrix factorization (NMF) by dropping the arguable assumption that audio signals can be whitened in the frequency domain by using short-term Fourier transform (STFT). Indeed, this assumption only holds true in an ideal situation where each frame is infinitely long and the target signal is completely stationary in each frame. PSDTF thus deals with full covariance matrices over frequency bins instead of forcing them to be diagonal as in NMF. Although PSDTF significantly outperforms NMF in terms of separation performance, it suffers from a heavy computational cost due to the repeated inversion of big covariance matrices. To solve this problem, we propose an intermediate model based on diagonal plus low-rank covariance matrices and derive the expectation-maximization (EM) algorithm for efficiently updating the parameters of PSDTF. Experimental results showed that our method can dramatically reduce the complexity of PSDTF by several orders of magnitude without a significant decrease in separation performance.
{"title":"A diagonal plus low-rank covariance model for computationally efficient source separation","authors":"A. Liutkus, Kazuyoshi Yoshii","doi":"10.1109/MLSP.2017.8168169","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168169","url":null,"abstract":"This paper presents an accelerated version of positive semidefinite tensor factorization (PSDTF) for blind source separation. PSDTF works better than nonnegative matrix factorization (NMF) by dropping the arguable assumption that audio signals can be whitened in the frequency domain by using short-term Fourier transform (STFT). Indeed, this assumption only holds true in an ideal situation where each frame is infinitely long and the target signal is completely stationary in each frame. PSDTF thus deals with full covariance matrices over frequency bins instead of forcing them to be diagonal as in NMF. Although PSDTF significantly outperforms NMF in terms of separation performance, it suffers from a heavy computational cost due to the repeated inversion of big covariance matrices. To solve this problem, we propose an intermediate model based on diagonal plus low-rank covariance matrices and derive the expectation-maximization (EM) algorithm for efficiently updating the parameters of PSDTF. Experimental results showed that our method can dramatically reduce the complexity of PSDTF by several orders of magnitude without a significant decrease in separation performance.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"34 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88310167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168149
Abdolreza Shirvani, B. Oommen
Data in all Signal Processing (SP) applications is being generated super-exponentially, and at an ever increasing rate. A meaningful way to pre-process it so as to achieve feasible computation is by Partitioning the data [5]. Indeed, the task of partitioning is one of the most difficult problems in computing, and it has extensive applications in solving real-life problems, especially when the amount of SP data (i.e., images, voices, speakers, libraries etc.) to be processed is prohibitively large. The problem is known to be NP-hard. The benchmark solution for this for the Equi-partitioning Problem (EPP) has involved the classic field of Learning Automata (LA), and the corresponding algorithm, the Object Migrating Automata (OMA) has been used in numerous application domains. While the OMA is a fixed structure machine, it does not incorporate the Pursuit concept that has, recently, significantly enhanced the field of LA. In this paper, we pioneer the incorporation of the Pursuit concept into the OMA. We do this by a non-intuitive paradigm, namely that of removing (or discarding) from the query stream, queries that could be counter-productive. This can be perceived as a filtering agent triggered by a pursuit-based module. The resulting machine, referred to as the Pursuit OMA (POMA), has been rigorously tested in all the standard benchmark environments. Indeed, in certain extreme environments it is almost ten times faster than the original OMA. The application of the POMA to all signal processing applications is extremely promising.
所有信号处理(SP)应用中的数据都在以超级指数级的速度增长。对数据进行预处理以实现可行的计算是一种有意义的方法[5]。事实上,分区任务是计算中最困难的问题之一,它在解决现实问题方面有广泛的应用,特别是当要处理的SP数据(即图像、声音、扬声器、库等)的数量非常大时。这个问题被称为NP-hard。针对等分割问题(EPP)的基准解决方案涉及到学习自动机(LA)的经典领域,而相应的算法——对象迁移自动机(OMA)已经在许多应用领域得到了应用。虽然OMA是一个固定结构的机器,但它并没有融入最近在LA领域得到显著提升的Pursuit概念。在本文中,我们率先将追求概念纳入OMA。我们通过一种非直观的范例来做到这一点,即从查询流中删除(或丢弃)可能适得其反的查询。这可以看作是由基于追踪的模块触发的过滤代理。生成的机器称为Pursuit OMA (POMA),已经在所有标准基准测试环境中进行了严格的测试。事实上,在某些极端环境下,它的速度几乎是原始OMA的十倍。POMA在所有信号处理应用中的应用是非常有前途的。
{"title":"Partitioning in signal processing using the object migration automaton and the pursuit paradigm","authors":"Abdolreza Shirvani, B. Oommen","doi":"10.1109/MLSP.2017.8168149","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168149","url":null,"abstract":"Data in all Signal Processing (SP) applications is being generated super-exponentially, and at an ever increasing rate. A meaningful way to pre-process it so as to achieve feasible computation is by Partitioning the data [5]. Indeed, the task of partitioning is one of the most difficult problems in computing, and it has extensive applications in solving real-life problems, especially when the amount of SP data (i.e., images, voices, speakers, libraries etc.) to be processed is prohibitively large. The problem is known to be NP-hard. The benchmark solution for this for the Equi-partitioning Problem (EPP) has involved the classic field of Learning Automata (LA), and the corresponding algorithm, the Object Migrating Automata (OMA) has been used in numerous application domains. While the OMA is a fixed structure machine, it does not incorporate the Pursuit concept that has, recently, significantly enhanced the field of LA. In this paper, we pioneer the incorporation of the Pursuit concept into the OMA. We do this by a non-intuitive paradigm, namely that of removing (or discarding) from the query stream, queries that could be counter-productive. This can be perceived as a filtering agent triggered by a pursuit-based module. The resulting machine, referred to as the Pursuit OMA (POMA), has been rigorously tested in all the standard benchmark environments. Indeed, in certain extreme environments it is almost ten times faster than the original OMA. The application of the POMA to all signal processing applications is extremely promising.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"440 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73598761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168192
Shi-Xue Wen, Jun Du, Chin-Hui Lee
We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating diverse noisy speech samples for training. It is shown that an arbitrary noise signal segment can be well represented by a linear combination of microstructure noise bases. Accordingly, we propose to generate these mixing noise signals by designing a set of compact and analytic noise bases without using any realistic noise types. The experiments demonstrate that this noise generation scheme can yield comparable performance to that using 50 real noise types. Furthermore, by supplementing the collected noise types with the synthesized noise bases, we observe remarkable performance improvements implying that not only a large collection of real-world noise signals can be alleviated, but also a good noise generalization capability can be achieved.
{"title":"On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models","authors":"Shi-Xue Wen, Jun Du, Chin-Hui Lee","doi":"10.1109/MLSP.2017.8168192","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168192","url":null,"abstract":"We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating diverse noisy speech samples for training. It is shown that an arbitrary noise signal segment can be well represented by a linear combination of microstructure noise bases. Accordingly, we propose to generate these mixing noise signals by designing a set of compact and analytic noise bases without using any realistic noise types. The experiments demonstrate that this noise generation scheme can yield comparable performance to that using 50 real noise types. Furthermore, by supplementing the collected noise types with the synthesized noise bases, we observe remarkable performance improvements implying that not only a large collection of real-world noise signals can be alleviated, but also a good noise generalization capability can be achieved.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"110 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75628645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168137
M. D. L. Alvarez, H. Hastie, D. Lane
Timeseries sensor data processing is indispensable for system monitoring. Working with autonomous vehicles requires mechanisms that provide insightful information about the status of a mission. In a setting where time and resources are limited, trajectory classification plays a vital role in mission monitoring and failure detection. In this context, we use navigational data to interpret trajectory patterns and classify them. We implement Long Short-Term Memory (LSTM) based Recursive Neural Networks (RNN) that learn the most commonly used survey trajectory patterns from surveys executed by two types of Autonomous Underwater Vehicles (AUV). We compare the performance of our network against baseline machine learning methods.
{"title":"Navigation-Based learning for survey trajectory classification in autonomous underwater vehicles","authors":"M. D. L. Alvarez, H. Hastie, D. Lane","doi":"10.1109/MLSP.2017.8168137","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168137","url":null,"abstract":"Timeseries sensor data processing is indispensable for system monitoring. Working with autonomous vehicles requires mechanisms that provide insightful information about the status of a mission. In a setting where time and resources are limited, trajectory classification plays a vital role in mission monitoring and failure detection. In this context, we use navigational data to interpret trajectory patterns and classify them. We implement Long Short-Term Memory (LSTM) based Recursive Neural Networks (RNN) that learn the most commonly used survey trajectory patterns from surveys executed by two types of Autonomous Underwater Vehicles (AUV). We compare the performance of our network against baseline machine learning methods.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"43 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80433944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168142
Li Li, H. Kameoka, S. Makino
The non-negative matrix factorization (NMF) approach has shown to work reasonably well for monaural speech enhancement tasks. This paper proposes addressing two shortcomings of the original NMF approach: (1) the objective functions for the basis training and separation (Wiener filtering) are inconsistent (the basis spectra are not trained so that the separated signal becomes optimal); (2) minimizing spectral divergence measures does not necessarily lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. To address the first shortcoming, we have previously proposed an algorithm for Discriminative NMF (DNMF), which optimizes the same objective for basis training and separation. To address the second shortcoming, we have previously introduced novel frameworks called the cepstral distance regularized NMF (CDRNMF) and mel-generalized cepstral distance regularized NMF (MGCRNMF), which aim to enhance speech both in the spectral domain and feature domain. This paper proposes combining the goals of DNMF and MGCRNMF by incorporating the MGC regularizer into the DNMF objective function and proposes an algorithm for parameter estimation. The experimental results revealed that the proposed method outperformed the baseline approaches.
{"title":"Mel-Generalized cepstral regularization for discriminative non-negative matrix factorization","authors":"Li Li, H. Kameoka, S. Makino","doi":"10.1109/MLSP.2017.8168142","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168142","url":null,"abstract":"The non-negative matrix factorization (NMF) approach has shown to work reasonably well for monaural speech enhancement tasks. This paper proposes addressing two shortcomings of the original NMF approach: (1) the objective functions for the basis training and separation (Wiener filtering) are inconsistent (the basis spectra are not trained so that the separated signal becomes optimal); (2) minimizing spectral divergence measures does not necessarily lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. To address the first shortcoming, we have previously proposed an algorithm for Discriminative NMF (DNMF), which optimizes the same objective for basis training and separation. To address the second shortcoming, we have previously introduced novel frameworks called the cepstral distance regularized NMF (CDRNMF) and mel-generalized cepstral distance regularized NMF (MGCRNMF), which aim to enhance speech both in the spectral domain and feature domain. This paper proposes combining the goals of DNMF and MGCRNMF by incorporating the MGC regularizer into the DNMF objective function and proposes an algorithm for parameter estimation. The experimental results revealed that the proposed method outperformed the baseline approaches.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"2014 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88132290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168154
Akihiko Kasagi, T. Tabaru, H. Tamura
Convolutional neural networks (CNNs), in which several convolutional layers extract feature patterns from an input image, are one of the most popular network architectures used for image classification. The convolutional computation, however, requires a high computational cost, resulting in an increased power consumption and processing time. In this paper, we propose a novel algorithm that substitutes a single layer for a pair formed by a convolutional layer and the following average-pooling layer. The key idea of the proposed scheme is to compute the output of the pair of original layers without the computation of convolution. To achieve this end, our algorithm generates summed area tables (SATs) of input images first and directly computes the output values from the SATs. We implemented our algorithm for forward propagation and backward propagation to evaluate the performance. Our experimental results showed that our algorithm achieved 17.1 times faster performance than the original algorithm for the same parameter used in ResNet-34.
{"title":"Fast algorithm using summed area tables with unified layer performing convolution and average pooling","authors":"Akihiko Kasagi, T. Tabaru, H. Tamura","doi":"10.1109/MLSP.2017.8168154","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168154","url":null,"abstract":"Convolutional neural networks (CNNs), in which several convolutional layers extract feature patterns from an input image, are one of the most popular network architectures used for image classification. The convolutional computation, however, requires a high computational cost, resulting in an increased power consumption and processing time. In this paper, we propose a novel algorithm that substitutes a single layer for a pair formed by a convolutional layer and the following average-pooling layer. The key idea of the proposed scheme is to compute the output of the pair of original layers without the computation of convolution. To achieve this end, our algorithm generates summed area tables (SATs) of input images first and directly computes the output values from the SATs. We implemented our algorithm for forward propagation and backward propagation to evaluate the performance. Our experimental results showed that our algorithm achieved 17.1 times faster performance than the original algorithm for the same parameter used in ResNet-34.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"15 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84937810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168179
Weizheng Yan, S. Plis, V. Calhoun, Shengfeng Liu, R. Jiang, T. Jiang, J. Sui
Deep learning has gained considerable attention in the scientific community, breaking benchmark records in many fields such as speech and visual recognition [1]. Motivated by extending advancement of deep learning approaches to brain imaging classification, we propose a framework, called “deep neural network (DNN)+ layer-wise relevance propagation (LRP)”, to distinguish schizophrenia patients (SZ) from healthy controls (HCs) using functional network connectivity (FNC). 1100 Chinese subjects of 7 sites are included, each with a 50∗50 FNC matrix resulted from group ICA on resting-state fMRI data. The proposed DNN+LRP not only improves classification accuracy significantly compare to four state-of-the-art classification methods (84% vs. less than 79%, 10 folds cross validation) but also enables identification of the most contributing FNC patterns related to SZ classification, which cannot be easily traced back by general DNN models. By conducting LRP, we identified the FNC patterns that exhibit the highest discriminative power in SZ classification. More importantly, when using leave-one-site-out cross validation (using 6 sites for training, 1 site for testing, 7 times in total), the cross-site classification accuracy reached 82%, suggesting high robustness and generalization performance of the proposed method, promising a wide utility in the community and great potentials for biomarker identification of brain disorders.
{"title":"Discriminating schizophrenia from normal controls using resting state functional network connectivity: A deep neural network and layer-wise relevance propagation method","authors":"Weizheng Yan, S. Plis, V. Calhoun, Shengfeng Liu, R. Jiang, T. Jiang, J. Sui","doi":"10.1109/MLSP.2017.8168179","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168179","url":null,"abstract":"Deep learning has gained considerable attention in the scientific community, breaking benchmark records in many fields such as speech and visual recognition [1]. Motivated by extending advancement of deep learning approaches to brain imaging classification, we propose a framework, called “deep neural network (DNN)+ layer-wise relevance propagation (LRP)”, to distinguish schizophrenia patients (SZ) from healthy controls (HCs) using functional network connectivity (FNC). 1100 Chinese subjects of 7 sites are included, each with a 50∗50 FNC matrix resulted from group ICA on resting-state fMRI data. The proposed DNN+LRP not only improves classification accuracy significantly compare to four state-of-the-art classification methods (84% vs. less than 79%, 10 folds cross validation) but also enables identification of the most contributing FNC patterns related to SZ classification, which cannot be easily traced back by general DNN models. By conducting LRP, we identified the FNC patterns that exhibit the highest discriminative power in SZ classification. More importantly, when using leave-one-site-out cross validation (using 6 sites for training, 1 site for testing, 7 times in total), the cross-site classification accuracy reached 82%, suggesting high robustness and generalization performance of the proposed method, promising a wide utility in the community and great potentials for biomarker identification of brain disorders.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75381907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168107
Tarn Nguyen, R. Raich, Xiaoli Z. Fern, Anh T. Pham
Manual labeling of individual instances is time-consuming. This is commonly resolved by labeling a bag-of-instances with a single common label or label-set. However, this approach is still time-costly for large datasets. In this paper, we propose a mixed-supervision multi-instance multi-label learning model for learning from easily available meta data information (MIML-AI). This auxiliary information is normally collected automatically with the data, e.g., an image location information or a document author name. We propose a discriminative graphical model with exact inferences to train a classifier based on auxiliary label information and a small number of labeled bags. This strategy utilizes meta data as means of providing a weaker label as an alternative to intensive manual labeling. Experiment on real data illustrates the effectiveness of our proposed method relative to current approaches, which do not use the information from bags that contain only meta-data label information.
{"title":"MIML-AI: Mixed-supervision multi-instance multi-label learning with auxiliary information","authors":"Tarn Nguyen, R. Raich, Xiaoli Z. Fern, Anh T. Pham","doi":"10.1109/MLSP.2017.8168107","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168107","url":null,"abstract":"Manual labeling of individual instances is time-consuming. This is commonly resolved by labeling a bag-of-instances with a single common label or label-set. However, this approach is still time-costly for large datasets. In this paper, we propose a mixed-supervision multi-instance multi-label learning model for learning from easily available meta data information (MIML-AI). This auxiliary information is normally collected automatically with the data, e.g., an image location information or a document author name. We propose a discriminative graphical model with exact inferences to train a classifier based on auxiliary label information and a small number of labeled bags. This strategy utilizes meta data as means of providing a weaker label as an alternative to intensive manual labeling. Experiment on real data illustrates the effectiveness of our proposed method relative to current approaches, which do not use the information from bags that contain only meta-data label information.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"146 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78590850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a statistical method of audio source separation based on a nonparametric Bayesian extension of probabilistic latent component analysis (PLCA). A major approach to audio source separation is to use nonnegative matrix factorization (NMF) that approximates the magnitude spectrum of a mixture signal at each frame as the weighted sum of fewer source spectra. Another approach is to use PLCA that regards the magnitude spectrogram as a two-dimensional histogram of “sound quanta” and classifies each quantum into one of sources. While NMF has a physically-natural interpretation, PLCA has been used successfully for music signal analysis. To enable PLCA to estimate the number of sources, we propose Dirichlet process PLCA (DP-PLCA) and derive two kinds of learning methods based on variational Bayes and collapsed Gibbs sampling. Unlike existing learning methods for nonparametric Bayesian NMF based on the beta or gamma processes (BP-NMF and GaP-NMF), our sampling method can efficiently search for the optimal number of sources without truncating the number of sources to be considered. Experimental results showed that DP-PLCA is superior to GaP-NMF in terms of source number estimation.
{"title":"Infinite probabilistic latent component analysis for audio source separation","authors":"Kazuyoshi Yoshii, Eita Nakamura, Katsutoshi Itoyama, Masataka Goto","doi":"10.1109/MLSP.2017.8168189","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168189","url":null,"abstract":"This paper presents a statistical method of audio source separation based on a nonparametric Bayesian extension of probabilistic latent component analysis (PLCA). A major approach to audio source separation is to use nonnegative matrix factorization (NMF) that approximates the magnitude spectrum of a mixture signal at each frame as the weighted sum of fewer source spectra. Another approach is to use PLCA that regards the magnitude spectrogram as a two-dimensional histogram of “sound quanta” and classifies each quantum into one of sources. While NMF has a physically-natural interpretation, PLCA has been used successfully for music signal analysis. To enable PLCA to estimate the number of sources, we propose Dirichlet process PLCA (DP-PLCA) and derive two kinds of learning methods based on variational Bayes and collapsed Gibbs sampling. Unlike existing learning methods for nonparametric Bayesian NMF based on the beta or gamma processes (BP-NMF and GaP-NMF), our sampling method can efficiently search for the optimal number of sources without truncating the number of sources to be considered. Experimental results showed that DP-PLCA is superior to GaP-NMF in terms of source number estimation.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"15 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87005228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168159
Rasmus Bonnevie, Mikkel N. Schmidt, Morten Mørup
Variational methods for approximate inference in Bayesian models optimise a lower bound on the marginal likelihood, but the optimization problem often suffers from being nonconvex and high-dimensional. This can be alleviated by working in a collapsed domain where a part of the parameter space is marginalized. We consider the KL-corrected collapsed variational bound and apply it to Dirichlet process mixture models, allowing us to reduce the optimization space considerably. We find that the variational bound exhibits consistent and exploitable structure, allowing the application of difference-of-convex optimization algorithms. We show how this yields an interpretable fixed-point update algorithm in the collapsed setting for the Dirichlet process mixture model. We connect this update formula to classical coordinate ascent updates, illustrating that the proposed improvement surprisingly reduces to the traditional scheme.
{"title":"Difference-of-Convex optimization for variational kl-corrected inference in dirichlet process mixtures","authors":"Rasmus Bonnevie, Mikkel N. Schmidt, Morten Mørup","doi":"10.1109/MLSP.2017.8168159","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168159","url":null,"abstract":"Variational methods for approximate inference in Bayesian models optimise a lower bound on the marginal likelihood, but the optimization problem often suffers from being nonconvex and high-dimensional. This can be alleviated by working in a collapsed domain where a part of the parameter space is marginalized. We consider the KL-corrected collapsed variational bound and apply it to Dirichlet process mixture models, allowing us to reduce the optimization space considerably. We find that the variational bound exhibits consistent and exploitable structure, allowing the application of difference-of-convex optimization algorithms. We show how this yields an interpretable fixed-point update algorithm in the collapsed setting for the Dirichlet process mixture model. We connect this update formula to classical coordinate ascent updates, illustrating that the proposed improvement surprisingly reduces to the traditional scheme.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85756170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}