{"title":"MFCC和矢量量化在说话人识别中的应用","authors":"A. Gupta, H. Gupta","doi":"10.1109/ISSP.2013.6526896","DOIUrl":null,"url":null,"abstract":"In speaker recognition, most of the computation originates from the likelihood computations between feature vectors of the unknown speaker and the models in the database. In this paper, we concentrate on optimizing Mel Frequency Cepstral Coefficient (MFCC) for feature extraction and Vector Quantization (VQ) for feature modeling. We reduce the number of feature vectors by pre-quantizing the test sequence prior to matching, and number of speakers by ruling out unlikely speakers during recognition process. The two important parameters, Recognition rate and minimized Average Distance between the samples, depends on the codebook size and the number of cepstral coefficients. We find, that this approach yields significant performance when the changes are made in the number of mfcc's and the codebook size. Recognition rate is found to reach upto 89% and the distortion reduced upto 69%.","PeriodicalId":354719,"journal":{"name":"2013 International Conference on Intelligent Systems and Signal Processing (ISSP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Applications of MFCC and Vector Quantization in speaker recognition\",\"authors\":\"A. Gupta, H. Gupta\",\"doi\":\"10.1109/ISSP.2013.6526896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In speaker recognition, most of the computation originates from the likelihood computations between feature vectors of the unknown speaker and the models in the database. In this paper, we concentrate on optimizing Mel Frequency Cepstral Coefficient (MFCC) for feature extraction and Vector Quantization (VQ) for feature modeling. We reduce the number of feature vectors by pre-quantizing the test sequence prior to matching, and number of speakers by ruling out unlikely speakers during recognition process. The two important parameters, Recognition rate and minimized Average Distance between the samples, depends on the codebook size and the number of cepstral coefficients. We find, that this approach yields significant performance when the changes are made in the number of mfcc's and the codebook size. Recognition rate is found to reach upto 89% and the distortion reduced upto 69%.\",\"PeriodicalId\":354719,\"journal\":{\"name\":\"2013 International Conference on Intelligent Systems and Signal Processing (ISSP)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Intelligent Systems and Signal Processing (ISSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSP.2013.6526896\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Intelligent Systems and Signal Processing (ISSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSP.2013.6526896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Applications of MFCC and Vector Quantization in speaker recognition
In speaker recognition, most of the computation originates from the likelihood computations between feature vectors of the unknown speaker and the models in the database. In this paper, we concentrate on optimizing Mel Frequency Cepstral Coefficient (MFCC) for feature extraction and Vector Quantization (VQ) for feature modeling. We reduce the number of feature vectors by pre-quantizing the test sequence prior to matching, and number of speakers by ruling out unlikely speakers during recognition process. The two important parameters, Recognition rate and minimized Average Distance between the samples, depends on the codebook size and the number of cepstral coefficients. We find, that this approach yields significant performance when the changes are made in the number of mfcc's and the codebook size. Recognition rate is found to reach upto 89% and the distortion reduced upto 69%.