Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168113
B. Gatto, J. Colonna, E. M. Santos, E. Nakamura
Bioacoustics signals classification is an important instrument used in environmental monitoring as it gives the means to efficiently acquire information from the areas, which most of the time are unfeasible to approach. To address these challenges, bioacoustics signals classification systems should meet some requirements, such as low computational resources capabilities. In this paper, we propose a novel bioacoustics signals classification method where no preprocessing techniques are involved and which is able to match sets of signals. The advantages of our proposed method include: a novel and compact representation for bioacoustics signals, which is independent of the signals length. In addition, no preprocessing is required, such as segmentation, noise reduction or syllable extraction. We show that our method is theoretically and practically attractive through experimental results employing a publicity available bioacoustics signal dataset.
{"title":"Mutual singular spectrum analysis for bioacoustics classification","authors":"B. Gatto, J. Colonna, E. M. Santos, E. Nakamura","doi":"10.1109/MLSP.2017.8168113","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168113","url":null,"abstract":"Bioacoustics signals classification is an important instrument used in environmental monitoring as it gives the means to efficiently acquire information from the areas, which most of the time are unfeasible to approach. To address these challenges, bioacoustics signals classification systems should meet some requirements, such as low computational resources capabilities. In this paper, we propose a novel bioacoustics signals classification method where no preprocessing techniques are involved and which is able to match sets of signals. The advantages of our proposed method include: a novel and compact representation for bioacoustics signals, which is independent of the signals length. In addition, no preprocessing is required, such as segmentation, noise reduction or syllable extraction. We show that our method is theoretically and practically attractive through experimental results employing a publicity available bioacoustics signal dataset.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"325 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76548095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168127
Haruki Mori, Tetsuya Youkawa, S. Izumi, M. Yoshimoto, H. Kawaguchi, Atsuki Inoue
This paper describes a pipelined stochastic gradient descent (SGD) algorithm and its hardware architecture with a memory distributed structure. In the proposed architecture, a pipeline stage takes charge of multiple layers: a “layer block.” The layer-block-wise pipeline has much less weight parameters for network training than conventional multithreading because weight memory is distributed to workers assigned to pipeline stages. The memory capacity of 2.25 GB for the four-stage proposed pipeline is about half of the 3.82 GB for multithreading when a batch size is 32 in VGG-F. Unlike multithreaded data parallelism, no parameter server for weight update or shared I/O data bus is necessary. Therefore, the memory bandwidth is drastically reduced. The proposed four-stage pipeline only needs memory bandwidths of 36.3 MB and 17.0 MB per batch, respectively, for forward propagation and backpropagation processes, whereas four-thread multithreading requires a bandwidth of 974 MB overall for send and receive processes to unify its weight parameters. At the parallelization degree of four, the proposed pipeline maintains training convergence by a factor of 1.12, compared with the conventional multithreaded architecture although the memory capacity and the memory bandwidth are decreased.
{"title":"A layer-block-wise pipeline for memory and bandwidth reduction in distributed deep learning","authors":"Haruki Mori, Tetsuya Youkawa, S. Izumi, M. Yoshimoto, H. Kawaguchi, Atsuki Inoue","doi":"10.1109/MLSP.2017.8168127","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168127","url":null,"abstract":"This paper describes a pipelined stochastic gradient descent (SGD) algorithm and its hardware architecture with a memory distributed structure. In the proposed architecture, a pipeline stage takes charge of multiple layers: a “layer block.” The layer-block-wise pipeline has much less weight parameters for network training than conventional multithreading because weight memory is distributed to workers assigned to pipeline stages. The memory capacity of 2.25 GB for the four-stage proposed pipeline is about half of the 3.82 GB for multithreading when a batch size is 32 in VGG-F. Unlike multithreaded data parallelism, no parameter server for weight update or shared I/O data bus is necessary. Therefore, the memory bandwidth is drastically reduced. The proposed four-stage pipeline only needs memory bandwidths of 36.3 MB and 17.0 MB per batch, respectively, for forward propagation and backpropagation processes, whereas four-thread multithreading requires a bandwidth of 974 MB overall for send and receive processes to unify its weight parameters. At the parallelization degree of four, the proposed pipeline maintains training convergence by a factor of 1.12, compared with the conventional multithreaded architecture although the memory capacity and the memory bandwidth are decreased.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74535184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168146
A. Seghouane, Asif Iqbal
Sequential dictionary learning algorithms have been successfully applied to a number of image processing problems. In a number of these problems however, the data used for dictionary learning are structured matrices with notions of smoothness in the column direction. This prior information which can be traduced as a smoothness constraint on the learned dictionary atoms has not been included in existing dictionary learning algorithms. In this paper, we remedy to this situation by proposing a regularized sequential dictionary learning algorithm. The proposed algorithm differs from the existing ones in their dictionary update stage. The proposed algorithm generates smooth dictionary atoms via the solution of a regularized rank-one matrix approximation problem where regularization is introduced via penalization in the dictionary update stage. Experimental results on synthetic and real data illustrating the performance of the proposed algorithm are provided.
{"title":"A regularized sequential dictionary learning algorithm for fmri data analysis","authors":"A. Seghouane, Asif Iqbal","doi":"10.1109/MLSP.2017.8168146","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168146","url":null,"abstract":"Sequential dictionary learning algorithms have been successfully applied to a number of image processing problems. In a number of these problems however, the data used for dictionary learning are structured matrices with notions of smoothness in the column direction. This prior information which can be traduced as a smoothness constraint on the learned dictionary atoms has not been included in existing dictionary learning algorithms. In this paper, we remedy to this situation by proposing a regularized sequential dictionary learning algorithm. The proposed algorithm differs from the existing ones in their dictionary update stage. The proposed algorithm generates smooth dictionary atoms via the solution of a regularized rank-one matrix approximation problem where regularization is introduced via penalization in the dictionary update stage. Experimental results on synthetic and real data illustrating the performance of the proposed algorithm are provided.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"51 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79798684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168175
Patrick Blöbaum, Shohei Shimizu
The interpretability of prediction mechanisms with respect to the underlying prediction problem is often unclear. While several studies have focused on developing prediction models with meaningful parameters, the causal relationships between the predictors and the actual prediction have not been considered. Here, we connect the underlying causal structure of a data generation process and the causal structure of a prediction mechanism. To achieve this, we propose a framework that identifies the feature with the greatest causal influence on the prediction and estimates the necessary causal intervention of a feature such that a desired prediction is obtained. The general concept of the framework has no restrictions regarding data linearity; however, we focus on an implementation for linear data here. The framework applicability is evaluated using artificial data and demonstrated using real-world data.
{"title":"Estimation of interventional effects of features on prediction","authors":"Patrick Blöbaum, Shohei Shimizu","doi":"10.1109/MLSP.2017.8168175","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168175","url":null,"abstract":"The interpretability of prediction mechanisms with respect to the underlying prediction problem is often unclear. While several studies have focused on developing prediction models with meaningful parameters, the causal relationships between the predictors and the actual prediction have not been considered. Here, we connect the underlying causal structure of a data generation process and the causal structure of a prediction mechanism. To achieve this, we propose a framework that identifies the feature with the greatest causal influence on the prediction and estimates the necessary causal intervention of a feature such that a desired prediction is obtained. The general concept of the framework has no restrictions regarding data linearity; however, we focus on an implementation for linear data here. The framework applicability is evaluated using artificial data and demonstrated using real-world data.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"20 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88589018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168124
Siqi Ye, S. Ravishankar, Y. Long, J. Fessler
Recent research in computed tomographic imaging has focused on developing techniques that enable reduction of the X-ray radiation dose without loss of quality of the reconstructed images or volumes. While penalized weighted-least squares (PWLS) approaches have been popular for CT image reconstruction, their performance degrades for very low dose levels due to the inaccuracy of the underlying WLS statistical model. We propose a new formulation for low-dose CT image reconstruction based on a shifted-Poisson model based likelihood function and a data-adaptive regularizer using the sparsifying transform model for images. The sparsifying transform is pre-learned from a dataset of patches extracted from CT images. The nonconvex cost function of the proposed penalized-likelihood reconstruction with sparsifying transforms regularizer (PL-ST) is optimized by alternating between a sparse coding step and an image update step. The image update step deploys a series of convex quadratic majorizers that are optimized using a relaxed linearized augmented Lagrangian method with ordered-subsets, reducing the number of (expensive) forward and backward projection operations. Numerical experiments show that for low dose levels, the proposed data-driven PL-ST approach outperforms prior methods employing a nonadaptive edge-preserving regularizer. PL-ST also outperforms prior PWLS-ST approach at very low X-ray doses.
{"title":"Adaptive sparse modeling and shifted-poisson likelihood based approach for low-dosect image reconstruction","authors":"Siqi Ye, S. Ravishankar, Y. Long, J. Fessler","doi":"10.1109/MLSP.2017.8168124","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168124","url":null,"abstract":"Recent research in computed tomographic imaging has focused on developing techniques that enable reduction of the X-ray radiation dose without loss of quality of the reconstructed images or volumes. While penalized weighted-least squares (PWLS) approaches have been popular for CT image reconstruction, their performance degrades for very low dose levels due to the inaccuracy of the underlying WLS statistical model. We propose a new formulation for low-dose CT image reconstruction based on a shifted-Poisson model based likelihood function and a data-adaptive regularizer using the sparsifying transform model for images. The sparsifying transform is pre-learned from a dataset of patches extracted from CT images. The nonconvex cost function of the proposed penalized-likelihood reconstruction with sparsifying transforms regularizer (PL-ST) is optimized by alternating between a sparse coding step and an image update step. The image update step deploys a series of convex quadratic majorizers that are optimized using a relaxed linearized augmented Lagrangian method with ordered-subsets, reducing the number of (expensive) forward and backward projection operations. Numerical experiments show that for low dose levels, the proposed data-driven PL-ST approach outperforms prior methods employing a nonadaptive edge-preserving regularizer. PL-ST also outperforms prior PWLS-ST approach at very low X-ray doses.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"86 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85993930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168184
Ryoma Tani, Hideyuki Watanabe, S. Katagiri, M. Ohsaki
Unlike Support Vector Machine (SVM), Kernel Minimum Classification Error (KMCE) training frees kernels from training samples and jointly optimizes weights and kernel locations. Focusing on this feature of KMCE training, we propose a new method for developing compact (small scale but highly accurate) kernel classifiers by applying KMCE training to support vectors (SVs) that are selected (based on the weight vector norm) from the original SVs produced by the Multi-class SVM (MSVM). We evaluate our proposed method in four classification tasks and clearly demonstrate its effectiveness: only a 3% drop in classification accuracy (from 99.1 to 89.1%) with just 10% of the original SVs. In addition, we mathematically reveal that the value of MSVM's kernel weight indicates the geometric relation between a training sample and margin boundaries.
{"title":"Compact kernel classifiers trained with minimum classification error criterion","authors":"Ryoma Tani, Hideyuki Watanabe, S. Katagiri, M. Ohsaki","doi":"10.1109/MLSP.2017.8168184","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168184","url":null,"abstract":"Unlike Support Vector Machine (SVM), Kernel Minimum Classification Error (KMCE) training frees kernels from training samples and jointly optimizes weights and kernel locations. Focusing on this feature of KMCE training, we propose a new method for developing compact (small scale but highly accurate) kernel classifiers by applying KMCE training to support vectors (SVs) that are selected (based on the weight vector norm) from the original SVs produced by the Multi-class SVM (MSVM). We evaluate our proposed method in four classification tasks and clearly demonstrate its effectiveness: only a 3% drop in classification accuracy (from 99.1 to 89.1%) with just 10% of the original SVs. In addition, we mathematically reveal that the value of MSVM's kernel weight indicates the geometric relation between a training sample and margin boundaries.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"71 1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83703392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168165
Thi-Thu-Hong Phan, É. Poisson, A. Bigand, A. Lefebvre
Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper, we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of recurrent data. This method involves two main steps. Firstly, we find the most similar sub-sequence to the sub-sequence before (resp. after) a gap based on the shape-features extraction and Dynamic Time Warping algorithms. Secondly, we fill in the gap by the next (resp. previous) sub-sequence of the most similar one on the signal containing missing values. Experimental results show that our approach performs better than several related methods in case of multivariate time series having low/non-correlations and effective information on each signal.
{"title":"DTW-Approach for uncorrelated multivariate time series imputation","authors":"Thi-Thu-Hong Phan, É. Poisson, A. Bigand, A. Lefebvre","doi":"10.1109/MLSP.2017.8168165","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168165","url":null,"abstract":"Missing data are inevitable in almost domains of applied sciences. Data analysis with missing values can lead to a loss of efficiency and unreliable results, especially for large missing sub-sequence(s). Some well-known methods for multivariate time series imputation require high correlations between series or their features. In this paper, we propose an approach based on the shape-behaviour relation in low/un-correlated multivariate time series under an assumption of recurrent data. This method involves two main steps. Firstly, we find the most similar sub-sequence to the sub-sequence before (resp. after) a gap based on the shape-features extraction and Dynamic Time Warping algorithms. Secondly, we fill in the gap by the next (resp. previous) sub-sequence of the most similar one on the signal containing missing values. Experimental results show that our approach performs better than several related methods in case of multivariate time series having low/non-correlations and effective information on each signal.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85671399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168156
Ryota Kawasumi, K. Takeda
We study the problem of matrix factorization by variational Bayes method, under the assumption that observed matrix is the product of low-rank dense and sparse matrices with additional noise. Under assumption of Laplace distribution for sparse matrix prior, we analytically derive an approximate solution of matrix factorization by minimizing Kullback-Leibler divergence between posterior and trial function. By evaluating our solution numerically, we also discuss accuracy of matrix factorization of our analytical solution.
{"title":"Approximate method of variational Bayesian matrix factorization with sparse prior","authors":"Ryota Kawasumi, K. Takeda","doi":"10.1109/MLSP.2017.8168156","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168156","url":null,"abstract":"We study the problem of matrix factorization by variational Bayes method, under the assumption that observed matrix is the product of low-rank dense and sparse matrices with additional noise. Under assumption of Laplace distribution for sparse matrix prior, we analytically derive an approximate solution of matrix factorization by minimizing Kullback-Leibler divergence between posterior and trial function. By evaluating our solution numerically, we also discuss accuracy of matrix factorization of our analytical solution.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"18 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88919391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168125
Shogo Seki, H. Kameoka, T. Toda, K. Takeda
While time-frequency masking is a powerful approach for speech enhancement in terms of signal recovery accuracy (e.g., signal-to-noise ratio), it can over-suppress and damage speech components, leading to limited performance of succeeding speech processing systems. To overcome this shortcoming, this paper proposes a method to restore missing components of time-frequency masked speech spectrograms based on direct estimation of a time domain signal. The proposed method allows us to take account of the local interdepen-dencies of the elements of the complex spectrogram derived from the redundancy of a time-frequency representation as well as the global structure of the magnitude spectrogram. The effectiveness of the proposed method is demonstrated through experimental evaluation, using spectrograms filtered with masks to enhance of noisy speech. Experimental results show that the proposed method significantly outperformed conventional methods, and has the potential to estimate both phase and magnitude spectra simultaneously and precisely.
{"title":"Missing component restoration for masked speech signals based on time-domain spectrogram factorization","authors":"Shogo Seki, H. Kameoka, T. Toda, K. Takeda","doi":"10.1109/MLSP.2017.8168125","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168125","url":null,"abstract":"While time-frequency masking is a powerful approach for speech enhancement in terms of signal recovery accuracy (e.g., signal-to-noise ratio), it can over-suppress and damage speech components, leading to limited performance of succeeding speech processing systems. To overcome this shortcoming, this paper proposes a method to restore missing components of time-frequency masked speech spectrograms based on direct estimation of a time domain signal. The proposed method allows us to take account of the local interdepen-dencies of the elements of the complex spectrogram derived from the redundancy of a time-frequency representation as well as the global structure of the magnitude spectrogram. The effectiveness of the proposed method is demonstrated through experimental evaluation, using spectrograms filtered with masks to enhance of noisy speech. Experimental results show that the proposed method significantly outperformed conventional methods, and has the potential to estimate both phase and magnitude spectra simultaneously and precisely.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"23 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90155060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-09-01DOI: 10.1109/MLSP.2017.8168177
Thee Chanyaswad, Mert Al, J. M. Chang, S. Kung
In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher's discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.
{"title":"Differential mutual information forward search for multi-kernel discriminant-component selection with an application to privacy-preserving classification","authors":"Thee Chanyaswad, Mert Al, J. M. Chang, S. Kung","doi":"10.1109/MLSP.2017.8168177","DOIUrl":"https://doi.org/10.1109/MLSP.2017.8168177","url":null,"abstract":"In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher's discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.","PeriodicalId":6542,"journal":{"name":"2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"23 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73191416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}