{"title":"语音识别中mel频率倒谱系数的定点优化","authors":"Ge Zhang, Jinghua Yin, Li-Yu Daisy Liu, Chao Yang","doi":"10.1109/IFOST.2011.6021229","DOIUrl":null,"url":null,"abstract":"Speech recognition is a computationally complexity process and it is suitable for battery powered devices like mobile phones and other personal PDAs. Particularly the parts of mel-scaled frequency cepstrum coefficients (MFCCs) are a process of dimension reduction for reducing resources to accurately describe speech samples. The optimized algorithm was applied to a binary-search-based look-up table to take place of original Taylor expansion algorithm, and it reduced the time of execution frames to meet real-time speech recognition system. The look-up tables were established by analysing the pseudo code to reduce the memory size in this paper. The transition algorithm of floating-point MFCCs to fixed-point ones was investigated to reach a higher precision in the first order approximation of linear interpolation of Log algorithm. The Hidden Markov Model Toolke (HTK) was applied to training the speech samples of Texas Instruments and Massachusetts Institute of Technology (TIMIT). The rate of speech recognition improved 12.02% by the optimized algorithm in the system of speech recognition.","PeriodicalId":20466,"journal":{"name":"Proceedings of 2011 6th International Forum on Strategic Technology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2011-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"The fixed-point optimization of mel frequency cepstrum coefficients for speech recognition\",\"authors\":\"Ge Zhang, Jinghua Yin, Li-Yu Daisy Liu, Chao Yang\",\"doi\":\"10.1109/IFOST.2011.6021229\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech recognition is a computationally complexity process and it is suitable for battery powered devices like mobile phones and other personal PDAs. Particularly the parts of mel-scaled frequency cepstrum coefficients (MFCCs) are a process of dimension reduction for reducing resources to accurately describe speech samples. The optimized algorithm was applied to a binary-search-based look-up table to take place of original Taylor expansion algorithm, and it reduced the time of execution frames to meet real-time speech recognition system. The look-up tables were established by analysing the pseudo code to reduce the memory size in this paper. The transition algorithm of floating-point MFCCs to fixed-point ones was investigated to reach a higher precision in the first order approximation of linear interpolation of Log algorithm. The Hidden Markov Model Toolke (HTK) was applied to training the speech samples of Texas Instruments and Massachusetts Institute of Technology (TIMIT). The rate of speech recognition improved 12.02% by the optimized algorithm in the system of speech recognition.\",\"PeriodicalId\":20466,\"journal\":{\"name\":\"Proceedings of 2011 6th International Forum on Strategic Technology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 2011 6th International Forum on Strategic Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IFOST.2011.6021229\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 2011 6th International Forum on Strategic Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IFOST.2011.6021229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The fixed-point optimization of mel frequency cepstrum coefficients for speech recognition
Speech recognition is a computationally complexity process and it is suitable for battery powered devices like mobile phones and other personal PDAs. Particularly the parts of mel-scaled frequency cepstrum coefficients (MFCCs) are a process of dimension reduction for reducing resources to accurately describe speech samples. The optimized algorithm was applied to a binary-search-based look-up table to take place of original Taylor expansion algorithm, and it reduced the time of execution frames to meet real-time speech recognition system. The look-up tables were established by analysing the pseudo code to reduce the memory size in this paper. The transition algorithm of floating-point MFCCs to fixed-point ones was investigated to reach a higher precision in the first order approximation of linear interpolation of Log algorithm. The Hidden Markov Model Toolke (HTK) was applied to training the speech samples of Texas Instruments and Massachusetts Institute of Technology (TIMIT). The rate of speech recognition improved 12.02% by the optimized algorithm in the system of speech recognition.