{"title":"Modifying LSTM Posteriors with Manner of Articulation Knowledge to Improve Speech Recognition Performance","authors":"Pradeep Rengaswamy, K. S. Rao","doi":"10.1109/ICMLA.2018.00122","DOIUrl":null,"url":null,"abstract":"The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"13 19","pages":"769-772"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.