{"title":"基于ISODATA聚类算法的高质量语音转换","authors":"Yanping Li, Yutao Zuo, Zhen Yang, Xi Shao","doi":"10.1109/ISKE.2017.8258822","DOIUrl":null,"url":null,"abstract":"Two main challenges introduced in current voice conversion are the dependence on parallel training data and the trade-off between speaker similarity and speech quality. To tackle the latter problem, this paper proposes a novel conversion method based on Iterative Self-organizing DATA Analysis Techniques Algorithm (ISODATA) clustering algorithm. Specially, we use ISODATA during the training of Gaussian mixture model, the optimized mixture number can guarantee the validity and accuracy of the GMM model, which can acquire speaker's identity effectively related to speaker similarity between original target speech and converted speech, Next, we combine improved GMM and bilinear frequency warping for the conversion stage, which can get a good balance between speaker similarity and speech quality. Theory analysis and experimental results demonstrate that the proposed algorithm can achieve higher quality and similarity compared with other two methods.","PeriodicalId":208009,"journal":{"name":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High quality voice conversion based on ISODATA clustering algorithm\",\"authors\":\"Yanping Li, Yutao Zuo, Zhen Yang, Xi Shao\",\"doi\":\"10.1109/ISKE.2017.8258822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two main challenges introduced in current voice conversion are the dependence on parallel training data and the trade-off between speaker similarity and speech quality. To tackle the latter problem, this paper proposes a novel conversion method based on Iterative Self-organizing DATA Analysis Techniques Algorithm (ISODATA) clustering algorithm. Specially, we use ISODATA during the training of Gaussian mixture model, the optimized mixture number can guarantee the validity and accuracy of the GMM model, which can acquire speaker's identity effectively related to speaker similarity between original target speech and converted speech, Next, we combine improved GMM and bilinear frequency warping for the conversion stage, which can get a good balance between speaker similarity and speech quality. Theory analysis and experimental results demonstrate that the proposed algorithm can achieve higher quality and similarity compared with other two methods.\",\"PeriodicalId\":208009,\"journal\":{\"name\":\"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE.2017.8258822\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2017.8258822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High quality voice conversion based on ISODATA clustering algorithm
Two main challenges introduced in current voice conversion are the dependence on parallel training data and the trade-off between speaker similarity and speech quality. To tackle the latter problem, this paper proposes a novel conversion method based on Iterative Self-organizing DATA Analysis Techniques Algorithm (ISODATA) clustering algorithm. Specially, we use ISODATA during the training of Gaussian mixture model, the optimized mixture number can guarantee the validity and accuracy of the GMM model, which can acquire speaker's identity effectively related to speaker similarity between original target speech and converted speech, Next, we combine improved GMM and bilinear frequency warping for the conversion stage, which can get a good balance between speaker similarity and speech quality. Theory analysis and experimental results demonstrate that the proposed algorithm can achieve higher quality and similarity compared with other two methods.