{"title":"基于动态神经网络的语音识别","authors":"N. M. Botros, S. Premnath","doi":"10.1109/IJCNN.1992.227230","DOIUrl":null,"url":null,"abstract":"The authors present an algorithm for isolated-word recognition that takes into consideration the duration variability of the different utterances of the same word. The algorithm is based on extracting acoustical features from the speech signal and using them as the input to a sequence of multilayer perceptron neural networks. The networks were implemented as predictors for the speech samples for a certain duration of time. The networks were trained by a combination of the back-propagation and the dynamic time warping (DTW) techniques. The DTW technique was implemented to normalize the duration variability. The networks were trained to recognize the correct words and to reject the wrong words. The training set consisted of ten words, each uttered seven times by three different speakers. The test set consisted of three utterances of each of the ten words. The results show that all these words could be recognized.<<ETX>>","PeriodicalId":286849,"journal":{"name":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Speech recognition using dynamic neural networks\",\"authors\":\"N. M. Botros, S. Premnath\",\"doi\":\"10.1109/IJCNN.1992.227230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The authors present an algorithm for isolated-word recognition that takes into consideration the duration variability of the different utterances of the same word. The algorithm is based on extracting acoustical features from the speech signal and using them as the input to a sequence of multilayer perceptron neural networks. The networks were implemented as predictors for the speech samples for a certain duration of time. The networks were trained by a combination of the back-propagation and the dynamic time warping (DTW) techniques. The DTW technique was implemented to normalize the duration variability. The networks were trained to recognize the correct words and to reject the wrong words. The training set consisted of ten words, each uttered seven times by three different speakers. The test set consisted of three utterances of each of the ten words. The results show that all these words could be recognized.<<ETX>>\",\"PeriodicalId\":286849,\"journal\":{\"name\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.1992.227230\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[Proceedings 1992] IJCNN International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.1992.227230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The authors present an algorithm for isolated-word recognition that takes into consideration the duration variability of the different utterances of the same word. The algorithm is based on extracting acoustical features from the speech signal and using them as the input to a sequence of multilayer perceptron neural networks. The networks were implemented as predictors for the speech samples for a certain duration of time. The networks were trained by a combination of the back-propagation and the dynamic time warping (DTW) techniques. The DTW technique was implemented to normalize the duration variability. The networks were trained to recognize the correct words and to reject the wrong words. The training set consisted of ten words, each uttered seven times by three different speakers. The test set consisted of three utterances of each of the ten words. The results show that all these words could be recognized.<>