M. Priyanka, V. S. Solomi, P. Vijayalakshmi, Tushar Nagarajan
{"title":"基于多分辨率特征提取的语音识别系统","authors":"M. Priyanka, V. S. Solomi, P. Vijayalakshmi, Tushar Nagarajan","doi":"10.1109/ICRTIT.2013.6844197","DOIUrl":null,"url":null,"abstract":"A speech recognition system will recognise the speech uttered into text. The accuracy of the recognition system depends on the models generated. Models are trained based on the features extracted from the available training data. These models are used to recognise the spoken text. In the conventional feature extraction method, features are extracted using single window size (say 20ms). Instead of this fixed window size, we propose to extract features using multiple window sizes from the same speech signal. When multiple window sizes are used, multiple sets of feature vectors are derived for the same word thereby increasing the number of examples. Experiments show that when features are extracted with multiple window sizes, the variations among the feature vectors are considerably increased, which will lead to better acoustic models. This multiresolution feature extraction technique is successfully used for building a speech recogniser. To analyse the performance of multiresolution feature extraction, isolated word speech recognition system is developed for the TIMIT speech corpus. Results reveal that around 8% improvement in recognition accuracy is obtained over conventional single resolution feature extraction based method.","PeriodicalId":113531,"journal":{"name":"2013 International Conference on Recent Trends in Information Technology (ICRTIT)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multiresolution feature extraction (MRFE) based speech recognition system\",\"authors\":\"M. Priyanka, V. S. Solomi, P. Vijayalakshmi, Tushar Nagarajan\",\"doi\":\"10.1109/ICRTIT.2013.6844197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A speech recognition system will recognise the speech uttered into text. The accuracy of the recognition system depends on the models generated. Models are trained based on the features extracted from the available training data. These models are used to recognise the spoken text. In the conventional feature extraction method, features are extracted using single window size (say 20ms). Instead of this fixed window size, we propose to extract features using multiple window sizes from the same speech signal. When multiple window sizes are used, multiple sets of feature vectors are derived for the same word thereby increasing the number of examples. Experiments show that when features are extracted with multiple window sizes, the variations among the feature vectors are considerably increased, which will lead to better acoustic models. This multiresolution feature extraction technique is successfully used for building a speech recogniser. To analyse the performance of multiresolution feature extraction, isolated word speech recognition system is developed for the TIMIT speech corpus. Results reveal that around 8% improvement in recognition accuracy is obtained over conventional single resolution feature extraction based method.\",\"PeriodicalId\":113531,\"journal\":{\"name\":\"2013 International Conference on Recent Trends in Information Technology (ICRTIT)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Recent Trends in Information Technology (ICRTIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2013.6844197\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Recent Trends in Information Technology (ICRTIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2013.6844197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multiresolution feature extraction (MRFE) based speech recognition system
A speech recognition system will recognise the speech uttered into text. The accuracy of the recognition system depends on the models generated. Models are trained based on the features extracted from the available training data. These models are used to recognise the spoken text. In the conventional feature extraction method, features are extracted using single window size (say 20ms). Instead of this fixed window size, we propose to extract features using multiple window sizes from the same speech signal. When multiple window sizes are used, multiple sets of feature vectors are derived for the same word thereby increasing the number of examples. Experiments show that when features are extracted with multiple window sizes, the variations among the feature vectors are considerably increased, which will lead to better acoustic models. This multiresolution feature extraction technique is successfully used for building a speech recogniser. To analyse the performance of multiresolution feature extraction, isolated word speech recognition system is developed for the TIMIT speech corpus. Results reveal that around 8% improvement in recognition accuracy is obtained over conventional single resolution feature extraction based method.