Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1423021
Joseph F. Murray, K. Kreutz-Delgado
Images can be coded accurately using a sparse set of vectors from an overcomplete dictionary, with potential applications in image compression and feature selection for pattern recognition. We discuss algorithms that perform sparse coding and make three contributions. First, we compare our overcomplete dictionary learning algorithm (FOCUSS-CNDL) with overcomplete independent component analysis (ICA). Second, noting that once a dictionary has been learned in a given domain the problem becomes one of choosing the vectors to form an accurate, sparse representation, we compare a recently developed algorithm (sparse Bayesian learning with adjustable variance Gaussians) to well known methods of subset selection: matching pursuit and FOCUSS. Third, noting that in some cases it may be necessary to find a non-negative sparse coding, we present a modified version of the FOCUSS algorithm that can find such non-negative codings
{"title":"Sparse image coding using learned overcomplete dictionaries","authors":"Joseph F. Murray, K. Kreutz-Delgado","doi":"10.1109/MLSP.2004.1423021","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1423021","url":null,"abstract":"Images can be coded accurately using a sparse set of vectors from an overcomplete dictionary, with potential applications in image compression and feature selection for pattern recognition. We discuss algorithms that perform sparse coding and make three contributions. First, we compare our overcomplete dictionary learning algorithm (FOCUSS-CNDL) with overcomplete independent component analysis (ICA). Second, noting that once a dictionary has been learned in a given domain the problem becomes one of choosing the vectors to form an accurate, sparse representation, we compare a recently developed algorithm (sparse Bayesian learning with adjustable variance Gaussians) to well known methods of subset selection: matching pursuit and FOCUSS. Third, noting that in some cases it may be necessary to find a non-negative sparse coding, we present a modified version of the FOCUSS algorithm that can find such non-negative codings","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"13 1","pages":"579-588"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88594865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1423031
C. Panazio, R.R. de F Attux
In this work, we propose an adaptive decision device based on a Kohonen network that can automatically generate the classes associated with each symbol of a 4n-QAM in the presence of non-linearities caused by the I/Q imbalance and additive Gaussian white noise, being also capable of compensating phase and gain variations produced by a time-varying flat-fading channel. Our proposal can achieve optimality in the maximum-likelihood sense with a small computational cost. Furthermore, due to the tracking ability inherent to the devised scheme, there is no need for an automatic gain controller or a phase-locked loop
{"title":"A 4/sup N/-QAM adaptive decision device to mitigate I/Q imbalance and impairments caused by time-varying flat fading channels","authors":"C. Panazio, R.R. de F Attux","doi":"10.1109/MLSP.2004.1423031","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1423031","url":null,"abstract":"In this work, we propose an adaptive decision device based on a Kohonen network that can automatically generate the classes associated with each symbol of a 4n-QAM in the presence of non-linearities caused by the I/Q imbalance and additive Gaussian white noise, being also capable of compensating phase and gain variations produced by a time-varying flat-fading channel. Our proposal can achieve optimality in the maximum-likelihood sense with a small computational cost. Furthermore, due to the tracking ability inherent to the devised scheme, there is no need for an automatic gain controller or a phase-locked loop","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"5 1","pages":"665-674"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73224680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1422996
T. Imbiriba, A. Klautau, N. Parihar, S. Raghavan, J. Picone
This paper describes an open source framework for developing speaker recognition systems. Among other features, it supports kernel classifiers, such as the support and relevance vector machines. The paper also presents results for the IME corpus using Gaussian mixture models, which outperforms previously published ones, and discusses strategies for applying discriminative classifiers to speaker recognition
{"title":"GMM and kernel-based speaker recognition with the ISIP toolkit","authors":"T. Imbiriba, A. Klautau, N. Parihar, S. Raghavan, J. Picone","doi":"10.1109/MLSP.2004.1422996","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1422996","url":null,"abstract":"This paper describes an open source framework for developing speaker recognition systems. Among other features, it supports kernel classifiers, such as the support and relevance vector machines. The paper also presents results for the IME corpus using Gaussian mixture models, which outperforms previously published ones, and discusses strategies for applying discriminative classifiers to speaker recognition","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"106 1","pages":"371-380"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74789711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1423024
E. Izquierdo, V. Guerra-Ones
A stability analysis of the approximate solution for spectral partitioning in image segmentation based on the Nystrom extension is presented. Algorithmic modifications are introduced to improve the stability of the original technique reported in (C. Fowlkes et al., 2004). The proposed improvement includes a criterion for the selection of the initial sample and more stable estimations of inverse matrices. The proposed algorithm is validated by several computer experiments
{"title":"Numerical stability of nystrom extension for image segmentation","authors":"E. Izquierdo, V. Guerra-Ones","doi":"10.1109/MLSP.2004.1423024","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1423024","url":null,"abstract":"A stability analysis of the approximate solution for spectral partitioning in image segmentation based on the Nystrom extension is presented. Algorithmic modifications are introduced to improve the stability of the original technique reported in (C. Fowlkes et al., 2004). The proposed improvement includes a criterion for the selection of the initial sample and more stable estimations of inverse matrices. The proposed algorithm is validated by several computer experiments","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"13 1","pages":"609-614"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87920565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1422995
E. James, A. Barros, T. Yoshinori, D. Mandic, N. Ohnishi
In this paper we propose a simple algorithm to enhance a speech signal with larger energy from a mixture of two sound sources. We used two microphones for acquisition of the sound signals and assume that either one of the speakers is closer of one of the microphones. In this algorithm, we use the concept of auditory filter banks with two psychoacoustic concepts: lateral inhibition and binaural masking. Preliminary computer simulations experiments confirm the validity of the proposed algorithm by objective and subjective measures
{"title":"Speech enhancement by lateral inhibition and binaural masking","authors":"E. James, A. Barros, T. Yoshinori, D. Mandic, N. Ohnishi","doi":"10.1109/MLSP.2004.1422995","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1422995","url":null,"abstract":"In this paper we propose a simple algorithm to enhance a speech signal with larger energy from a mixture of two sound sources. We used two microphones for acquisition of the sound signals and assume that either one of the speakers is closer of one of the microphones. In this algorithm, we use the concept of auditory filter banks with two psychoacoustic concepts: lateral inhibition and binaural masking. Preliminary computer simulations experiments confirm the validity of the proposed algorithm by objective and subjective measures","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"21 1","pages":"365-370"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75305686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1422960
Christopher P. Diehl
Leave-one-out (LOO) error estimation is an important statistical tool for assessing generalization performance. A number of papers have focused on LOO error estimation for support vector machines, but little work has focused on LOO error estimation when learning with smooth, convex margin loss functions. We consider the problem of approximating the LOO error estimate in the context of sparse kernel machine learning. We first motivate a general framework for learning sparse kernel machines that involves minimizing a regularized, smooth, strictly convex margin loss. Then we present an approximation of the LOO error for the family of learning algorithms admissible in the general framework. We examine the implications of the approximation and review preliminary experimental results demonstrating the utility of the approach
{"title":"Approximate leave-one-out error estimation for learning with smooth, strictly convex margin loss functions","authors":"Christopher P. Diehl","doi":"10.1109/MLSP.2004.1422960","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1422960","url":null,"abstract":"Leave-one-out (LOO) error estimation is an important statistical tool for assessing generalization performance. A number of papers have focused on LOO error estimation for support vector machines, but little work has focused on LOO error estimation when learning with smooth, convex margin loss functions. We consider the problem of approximating the LOO error estimate in the context of sparse kernel machine learning. We first motivate a general framework for learning sparse kernel machines that involves minimizing a regularized, smooth, strictly convex margin loss. Then we present an approximation of the LOO error for the family of learning algorithms admissible in the general framework. We examine the implications of the approximation and review preliminary experimental results demonstrating the utility of the approach","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"17 1","pages":"63-72"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75139206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1422992
E. B. Bilcu, J. Astola, J. Saarinen
Text-to-phoneme (TTP) mapping is a preliminary step in text-to-speech synthesis and it affects the naturalness and understandability of synthetic speech. In this paper, we propose a hybrid neural network/rule based system for bilingual text-to-phoneme mapping. Our system uses three neural networks and a simple rule to perform the phoneme transcription. The first network is trained to convert the letters from the first language into their corresponding phonemes, the second one is used to obtain the phonemes for the second language whereas the third neural network together with a simple rule is responsible of the language recognition. The proposed approach can be easily extended for multilingual applications when more neural networks are introduced. Simulations performed on a bilingual dictionary (English+French) show the improvements in terms of phoneme accuracy of our method against the approach that uses a single neural network for multilingual TTP
{"title":"A hybrid neural network/rule based system for bilingual text-to-phoneme mapping","authors":"E. B. Bilcu, J. Astola, J. Saarinen","doi":"10.1109/MLSP.2004.1422992","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1422992","url":null,"abstract":"Text-to-phoneme (TTP) mapping is a preliminary step in text-to-speech synthesis and it affects the naturalness and understandability of synthetic speech. In this paper, we propose a hybrid neural network/rule based system for bilingual text-to-phoneme mapping. Our system uses three neural networks and a simple rule to perform the phoneme transcription. The first network is trained to convert the letters from the first language into their corresponding phonemes, the second one is used to obtain the phonemes for the second language whereas the third neural network together with a simple rule is responsible of the language recognition. The proposed approach can be easily extended for multilingual applications when more neural networks are introduced. Simulations performed on a bilingual dictionary (English+French) show the improvements in terms of phoneme accuracy of our method against the approach that uses a single neural network for multilingual TTP","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"109 1","pages":"345-354"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77266517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1423015
Hong Guo, L. B. Jack, A. Nandi
The feature extraction is one of the major challenges for the pattern recognition. This helps to maximise the useful information from the raw data in order to make the classification effective and simple. In this paper, one of the machine learning approaches, genetic programming (GP), is employed to extract features from the raw vibration data taken from a rotating machine with several different conditions. The created features are then used as the input to a simple ANN for the identification of different bearing conditions, in comparison with the other classical machine learning methods. Experimental results demonstrate the capability of GP to discover automatically the functional relationships among the raw vibration data, to give improved performance
{"title":"Automated feature extraction using genetic programming for bearing condition monitoring","authors":"Hong Guo, L. B. Jack, A. Nandi","doi":"10.1109/MLSP.2004.1423015","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1423015","url":null,"abstract":"The feature extraction is one of the major challenges for the pattern recognition. This helps to maximise the useful information from the raw data in order to make the classification effective and simple. In this paper, one of the machine learning approaches, genetic programming (GP), is employed to extract features from the raw vibration data taken from a rotating machine with several different conditions. The created features are then used as the input to a simple ANN for the identification of different bearing conditions, in comparison with the other classical machine learning methods. Experimental results demonstrate the capability of GP to discover automatically the functional relationships among the raw vibration data, to give improved performance","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"33 1","pages":"519-528"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80065611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1422975
J. Kohlmorgen
Most real-world systems exhibit a non-stationary behavior, e.g., slow drifts due to wear or fast changes due to external influences. Extracting and quantifying these phenomena is often difficult due to the lack of a precise mathematical model of the underlying system. We here propose to model such high-level changes of a dynamical system solely on the basis of the observed measurements rather than by modeling the underlying system itself. In particular, we present a method to track and visualize changes in general data distributions. We approach the problem of how to represent continuous changes in high-dimensional non-parametric distributions by identifying anchor distributions and we model the transitions between those anchor distributions by defining a suitable similarity measure. Applications to a high-dimensional chaotic system and to a sleep-onset detection task in EEG demonstrate the efficiency of this approach
{"title":"Tracking and visualization of changes in high-dimensional non-parametric distributions","authors":"J. Kohlmorgen","doi":"10.1109/MLSP.2004.1422975","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1422975","url":null,"abstract":"Most real-world systems exhibit a non-stationary behavior, e.g., slow drifts due to wear or fast changes due to external influences. Extracting and quantifying these phenomena is often difficult due to the lack of a precise mathematical model of the underlying system. We here propose to model such high-level changes of a dynamical system solely on the basis of the observed measurements rather than by modeling the underlying system itself. In particular, we present a method to track and visualize changes in general data distributions. We approach the problem of how to represent continuous changes in high-dimensional non-parametric distributions by identifying anchor distributions and we model the transitions between those anchor distributions by defining a suitable similarity measure. Applications to a high-dimensional chaotic system and to a sleep-onset detection task in EEG demonstrate the efficiency of this approach","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"24 1","pages":"203-212"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84762875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2004-09-29DOI: 10.1109/MLSP.2004.1423017
I. Buciu, Ioannis Pitas
In this paper, we present a novel algorithm for learning facial expressions in a supervised manner. This algorithm is derived from the local non-negative matrix factorization (LNMF) algorithm, which is an extension of non-negative matrix factorization (NMF) method. We call this newly proposed algorithm discriminant non-negative matrix factorization (DNMF). Given an image database, all these three algorithms decompose the database into basis images and their corresponding coefficients. This decomposition is computed differently for each method. The decomposition results are applied on facial images for the recognition of the six basic facial expressions. We found that our algorithm shows superior performance by achieving a higher recognition rate, when compared to NMF and LNMF
{"title":"A new sparse image representation algorithm applied to facial expression recognition","authors":"I. Buciu, Ioannis Pitas","doi":"10.1109/MLSP.2004.1423017","DOIUrl":"https://doi.org/10.1109/MLSP.2004.1423017","url":null,"abstract":"In this paper, we present a novel algorithm for learning facial expressions in a supervised manner. This algorithm is derived from the local non-negative matrix factorization (LNMF) algorithm, which is an extension of non-negative matrix factorization (NMF) method. We call this newly proposed algorithm discriminant non-negative matrix factorization (DNMF). Given an image database, all these three algorithms decompose the database into basis images and their corresponding coefficients. This decomposition is computed differently for each method. The decomposition results are applied on facial images for the recognition of the six basic facial expressions. We found that our algorithm shows superior performance by achieving a higher recognition rate, when compared to NMF and LNMF","PeriodicalId":70952,"journal":{"name":"信号处理","volume":"1 1","pages":"539-548"},"PeriodicalIF":0.0,"publicationDate":"2004-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78350701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}