{"title":"基于感知mvdr的哈萨克语语音识别的无监督内置说话人归一化","authors":"Zhandos Yessenbayev, U. Yapanel","doi":"10.1109/ICAICT.2014.7035914","DOIUrl":null,"url":null,"abstract":"In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20% The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.","PeriodicalId":103329,"journal":{"name":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition\",\"authors\":\"Zhandos Yessenbayev, U. Yapanel\",\"doi\":\"10.1109/ICAICT.2014.7035914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20% The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.\",\"PeriodicalId\":103329,\"journal\":{\"name\":\"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICT.2014.7035914\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICT.2014.7035914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Perceptual MVDR-based unsupervised built-in speaker normalization for Kazakh speech recognition
In this work we present a novel approach to unsupervised speaker normalization on top of the Perceptual MVDR-based Built-in Speaker Normalization technique. We showed that the proposed method can be efficient for the task of phonetic recognition on TIMIT and then applied it to Kazakh speech recognition. From the experiments, we see that this method is able to improve the relative performance of ASR systems up to 20% The analysis of the optimal warp factor selection by the algorithm revealed a nice gender separation ability which may be used for gender/speaker classification tasks.