{"title":"语音识别的多源神经网络","authors":"R. Gemello, D. Albesano, F. Mana","doi":"10.1109/IJCNN.1999.835942","DOIUrl":null,"url":null,"abstract":"In speech recognition the most diffused technology (hidden Markov models) is constrained by the condition of stochastic independence of its input features. That limits the simultaneous use of features derived from the speech signal with different processing algorithms. On the contrary artificial neural networks (ANN) are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this characteristic of ANNs to improve the speech recognition accuracy through the combined use of input features coming from different sources (different feature extraction algorithms). We integrate two input sources: the Mel based cepstral coefficients (MFCC) derived from FFT and the RASTA-PLP cepstral coefficients. The results show that this integration leads to an error reduction of 26% on a telephone quality test set.","PeriodicalId":157719,"journal":{"name":"IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Multi-source neural networks for speech recognition\",\"authors\":\"R. Gemello, D. Albesano, F. Mana\",\"doi\":\"10.1109/IJCNN.1999.835942\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In speech recognition the most diffused technology (hidden Markov models) is constrained by the condition of stochastic independence of its input features. That limits the simultaneous use of features derived from the speech signal with different processing algorithms. On the contrary artificial neural networks (ANN) are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this characteristic of ANNs to improve the speech recognition accuracy through the combined use of input features coming from different sources (different feature extraction algorithms). We integrate two input sources: the Mel based cepstral coefficients (MFCC) derived from FFT and the RASTA-PLP cepstral coefficients. The results show that this integration leads to an error reduction of 26% on a telephone quality test set.\",\"PeriodicalId\":157719,\"journal\":{\"name\":\"IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.1999.835942\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1999.835942","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-source neural networks for speech recognition
In speech recognition the most diffused technology (hidden Markov models) is constrained by the condition of stochastic independence of its input features. That limits the simultaneous use of features derived from the speech signal with different processing algorithms. On the contrary artificial neural networks (ANN) are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this characteristic of ANNs to improve the speech recognition accuracy through the combined use of input features coming from different sources (different feature extraction algorithms). We integrate two input sources: the Mel based cepstral coefficients (MFCC) derived from FFT and the RASTA-PLP cepstral coefficients. The results show that this integration leads to an error reduction of 26% on a telephone quality test set.