{"title":"基于谱维特征的说话人识别","authors":"Wen-Shiung Chen, Jr-Feng Huang","doi":"10.1109/ICCGI.2009.27","DOIUrl":null,"url":null,"abstract":"Biometric recognition is more and more important due to security applications all over the world. Mobile phone becomes popular in recent years. Therefore voice recognition for recognizing a speaker’s identity also plays a potential role. This paper presents a speaker recognition that combines a non-linear feature, named spectral dimension (SD), with Mel Frequency Cepstral Coefficients (MFCC). In order to improve the performance of the proposed scheme, the Mel-scale method is adopted for allocating sub-bands and the pattern matching is trained by Gaussian mixture model. Some problems related to spectral dimension are discussed and the comparison with other simple spectral features is made. We observe that our proposed methods can improve the performance in different components. For instance, speaker verification combining MFCC with our proposed SD features gives a good performance of EER=2.3140% by 32_Multi-GMM. The relative improvement is about 22% better than the method that is based only on MFCC with EER=2.9631%.","PeriodicalId":201271,"journal":{"name":"2009 Fourth International Multi-Conference on Computing in the Global Information Technology","volume":"23 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Speaker Recognition Using Spectral Dimension Features\",\"authors\":\"Wen-Shiung Chen, Jr-Feng Huang\",\"doi\":\"10.1109/ICCGI.2009.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Biometric recognition is more and more important due to security applications all over the world. Mobile phone becomes popular in recent years. Therefore voice recognition for recognizing a speaker’s identity also plays a potential role. This paper presents a speaker recognition that combines a non-linear feature, named spectral dimension (SD), with Mel Frequency Cepstral Coefficients (MFCC). In order to improve the performance of the proposed scheme, the Mel-scale method is adopted for allocating sub-bands and the pattern matching is trained by Gaussian mixture model. Some problems related to spectral dimension are discussed and the comparison with other simple spectral features is made. We observe that our proposed methods can improve the performance in different components. For instance, speaker verification combining MFCC with our proposed SD features gives a good performance of EER=2.3140% by 32_Multi-GMM. The relative improvement is about 22% better than the method that is based only on MFCC with EER=2.9631%.\",\"PeriodicalId\":201271,\"journal\":{\"name\":\"2009 Fourth International Multi-Conference on Computing in the Global Information Technology\",\"volume\":\"23 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Fourth International Multi-Conference on Computing in the Global Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCGI.2009.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fourth International Multi-Conference on Computing in the Global Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCGI.2009.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speaker Recognition Using Spectral Dimension Features
Biometric recognition is more and more important due to security applications all over the world. Mobile phone becomes popular in recent years. Therefore voice recognition for recognizing a speaker’s identity also plays a potential role. This paper presents a speaker recognition that combines a non-linear feature, named spectral dimension (SD), with Mel Frequency Cepstral Coefficients (MFCC). In order to improve the performance of the proposed scheme, the Mel-scale method is adopted for allocating sub-bands and the pattern matching is trained by Gaussian mixture model. Some problems related to spectral dimension are discussed and the comparison with other simple spectral features is made. We observe that our proposed methods can improve the performance in different components. For instance, speaker verification combining MFCC with our proposed SD features gives a good performance of EER=2.3140% by 32_Multi-GMM. The relative improvement is about 22% better than the method that is based only on MFCC with EER=2.9631%.