基于ISODATA聚类算法的高质量语音转换

2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) Pub Date : 2017-11-01 DOI:10.1109/ISKE.2017.8258822

Yanping Li, Yutao Zuo, Zhen Yang, Xi Shao

{"title":"基于ISODATA聚类算法的高质量语音转换","authors":"Yanping Li, Yutao Zuo, Zhen Yang, Xi Shao","doi":"10.1109/ISKE.2017.8258822","DOIUrl":null,"url":null,"abstract":"Two main challenges introduced in current voice conversion are the dependence on parallel training data and the trade-off between speaker similarity and speech quality. To tackle the latter problem, this paper proposes a novel conversion method based on Iterative Self-organizing DATA Analysis Techniques Algorithm (ISODATA) clustering algorithm. Specially, we use ISODATA during the training of Gaussian mixture model, the optimized mixture number can guarantee the validity and accuracy of the GMM model, which can acquire speaker's identity effectively related to speaker similarity between original target speech and converted speech, Next, we combine improved GMM and bilinear frequency warping for the conversion stage, which can get a good balance between speaker similarity and speech quality. Theory analysis and experimental results demonstrate that the proposed algorithm can achieve higher quality and similarity compared with other two methods.","PeriodicalId":208009,"journal":{"name":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High quality voice conversion based on ISODATA clustering algorithm\",\"authors\":\"Yanping Li, Yutao Zuo, Zhen Yang, Xi Shao\",\"doi\":\"10.1109/ISKE.2017.8258822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two main challenges introduced in current voice conversion are the dependence on parallel training data and the trade-off between speaker similarity and speech quality. To tackle the latter problem, this paper proposes a novel conversion method based on Iterative Self-organizing DATA Analysis Techniques Algorithm (ISODATA) clustering algorithm. Specially, we use ISODATA during the training of Gaussian mixture model, the optimized mixture number can guarantee the validity and accuracy of the GMM model, which can acquire speaker's identity effectively related to speaker similarity between original target speech and converted speech, Next, we combine improved GMM and bilinear frequency warping for the conversion stage, which can get a good balance between speaker similarity and speech quality. Theory analysis and experimental results demonstrate that the proposed algorithm can achieve higher quality and similarity compared with other two methods.\",\"PeriodicalId\":208009,\"journal\":{\"name\":\"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISKE.2017.8258822\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2017.8258822","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当前语音转换面临的两个主要挑战是对并行训练数据的依赖以及说话人相似度和语音质量之间的权衡。针对后一个问题，本文提出了一种基于迭代自组织数据分析技术(ISODATA)聚类算法的转换方法。其中，在高斯混合模型的训练过程中使用了ISODATA，优化后的混合数可以保证GMM模型的有效性和准确性，有效地获取原始目标语音和转换后语音之间与说话人相似度相关的说话人身份，然后在转换阶段将改进的GMM与双线性频率扭曲相结合，在说话人相似度和语音质量之间取得了很好的平衡。理论分析和实验结果表明，与其他两种方法相比，该算法可以获得更高的质量和相似度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

High quality voice conversion based on ISODATA clustering algorithm

Two main challenges introduced in current voice conversion are the dependence on parallel training data and the trade-off between speaker similarity and speech quality. To tackle the latter problem, this paper proposes a novel conversion method based on Iterative Self-organizing DATA Analysis Techniques Algorithm (ISODATA) clustering algorithm. Specially, we use ISODATA during the training of Gaussian mixture model, the optimized mixture number can guarantee the validity and accuracy of the GMM model, which can acquire speaker's identity effectively related to speaker similarity between original target speech and converted speech, Next, we combine improved GMM and bilinear frequency warping for the conversion stage, which can get a good balance between speaker similarity and speech quality. Theory analysis and experimental results demonstrate that the proposed algorithm can achieve higher quality and similarity compared with other two methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)

自引率

0.00%

发文量