{"title":"Recognition of isolated words of esophageal speech using GMM and gradient descent RBF networks","authors":"P. Malathi, G. Suresh","doi":"10.1109/CNT.2014.7062749","DOIUrl":null,"url":null,"abstract":"Speech signal can be represented as a combination of acoustic parameters extracted from the speech signal. The parameter vectors are assumed to be the constituents of the speech signal over a specified duration during which it is stationary. Typical representations are Mel Frequency Cepstral Coefficients, Linear Prediction Coefficients etc. The process of isolated word recognition involves the mapping of these parameters with speech but it cannot because there are large variations in the realized speech waveform due to speaker variability, modulation, context, etc. The parametric speech vectors corresponding to each vector is modeled using Gaussian Mixture Model and its distribution is observed. The Expectation Maximisation algorithm is used in the Radial Basis Function network to best fit the test vector. The gradient descent algorithm applied on Radial Basis Function Neural Network is proposed to approximate the functions which have high non-linear order. The learning rates of the network are made proportional to the probability densities obtained from the Gaussian Mixture Model. Isolated words of esophageal speech appear to be recognized better in this method compared to previous methods since it consists of non linear components.","PeriodicalId":347883,"journal":{"name":"2014 International Conference on Communication and Network Technologies","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Communication and Network Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CNT.2014.7062749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Speech signal can be represented as a combination of acoustic parameters extracted from the speech signal. The parameter vectors are assumed to be the constituents of the speech signal over a specified duration during which it is stationary. Typical representations are Mel Frequency Cepstral Coefficients, Linear Prediction Coefficients etc. The process of isolated word recognition involves the mapping of these parameters with speech but it cannot because there are large variations in the realized speech waveform due to speaker variability, modulation, context, etc. The parametric speech vectors corresponding to each vector is modeled using Gaussian Mixture Model and its distribution is observed. The Expectation Maximisation algorithm is used in the Radial Basis Function network to best fit the test vector. The gradient descent algorithm applied on Radial Basis Function Neural Network is proposed to approximate the functions which have high non-linear order. The learning rates of the network are made proportional to the probability densities obtained from the Gaussian Mixture Model. Isolated words of esophageal speech appear to be recognized better in this method compared to previous methods since it consists of non linear components.