Novel Pre-processing using Outlier Removal in Voice Conversion

S. Rao, Nirmesh J. Shah, H. Patil
{"title":"Novel Pre-processing using Outlier Removal in Voice Conversion","authors":"S. Rao, Nirmesh J. Shah, H. Patil","doi":"10.21437/SSW.2016-22","DOIUrl":null,"url":null,"abstract":"Voice conversion (VC) technique modifies the speech utter-ance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It finds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers . Until now, there has been very few literature regarding the effect of outliers in voice conversion. In this paper, we have explored the effect of outliers in voice conversion, as a pre-processing step. In order to remove these outliers, we have used the score distance, which uses the scores estimated using Robust Principal Component Analysis (ROBPCA). The outliers are determined by using a cut-off value based on the degrees of freedom in a chi-squared distribution. They are then removed from the training dataset and a GMM is trained based on the least outlying points. This pre-processing step can be applied to various methods. Experimental results indicate that there is a clear improvement in both, the objective ( 8 %) as well as the subjective ( 4 % for MOS and 5 % for XAB) results.","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Synthesis Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SSW.2016-22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Voice conversion (VC) technique modifies the speech utter-ance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It finds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers . Until now, there has been very few literature regarding the effect of outliers in voice conversion. In this paper, we have explored the effect of outliers in voice conversion, as a pre-processing step. In order to remove these outliers, we have used the score distance, which uses the scores estimated using Robust Principal Component Analysis (ROBPCA). The outliers are determined by using a cut-off value based on the degrees of freedom in a chi-squared distribution. They are then removed from the training dataset and a GMM is trained based on the least outlying points. This pre-processing step can be applied to various methods. Experimental results indicate that there is a clear improvement in both, the objective ( 8 %) as well as the subjective ( 4 % for MOS and 5 % for XAB) results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语音转换中使用离群值去除的新型预处理方法
语音转换(VC)技术是对源说话者发出的语音进行修改,使其听起来像目标说话者在说话。基于高斯混合模型(GMM)的VC是一种最新的VC方法。利用GMM对声源和目标声源的联合密度进行建模,并对频谱特征进行分帧转换,从而求出映射函数。与任何真实数据集一样,光谱参数包含一些与其他数据不一致的点,称为离群值。到目前为止,关于异常值在语音转换中的影响的文献很少。在本文中,我们探讨了异常值在语音转换中的影响,作为预处理步骤。为了去除这些异常值,我们使用了分数距离,它使用鲁棒主成分分析(ROBPCA)估计的分数。异常值是通过使用基于卡方分布中自由度的截止值确定的。然后将它们从训练数据集中移除,并基于最小离群点训练GMM。这个预处理步骤可以应用于各种方法。实验结果表明,在客观(8%)和主观(4%的MOS和5%的XAB)结果上都有明显的改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Archiving pushed Inferences from Sensor Data Streams Parallel and cascaded deep neural networks for text-to-speech synthesis Merlin: An Open Source Neural Network Speech Synthesis System A Comparative Study of the Performance of HMM, DNN, and RNN based Speech Synthesis Systems Trained on Very Large Speaker-Dependent Corpora Nonaudible murmur enhancement based on statistical voice conversion and noise suppression with external noise monitoring
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1