Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model

Zongyang Du, Junchen Lu, Kun Zhou, Lakshmish Kaushik, Berrak Sisman
{"title":"Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a Conditional Diffusion Model","authors":"Zongyang Du, Junchen Lu, Kun Zhou, Lakshmish Kaushik, Berrak Sisman","doi":"arxiv-2405.01730","DOIUrl":null,"url":null,"abstract":"Expressive voice conversion (VC) conducts speaker identity conversion for\nemotional speakers by jointly converting speaker identity and emotional style.\nEmotional style modeling for arbitrary speakers in expressive VC has not been\nextensively explored. Previous approaches have relied on vocoders for speech\nreconstruction, which makes speech quality heavily dependent on the performance\nof vocoders. A major challenge of expressive VC lies in emotion prosody\nmodeling. To address these challenges, this paper proposes a fully end-to-end\nexpressive VC framework based on a conditional denoising diffusion\nprobabilistic model (DDPM). We utilize speech units derived from\nself-supervised speech models as content conditioning, along with deep features\nextracted from speech emotion recognition and speaker verification systems to\nmodel emotional style and speaker identity. Objective and subjective\nevaluations show the effectiveness of our framework. Codes and samples are\npublicly available.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.01730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been extensively explored. Previous approaches have relied on vocoders for speech reconstruction, which makes speech quality heavily dependent on the performance of vocoders. A major challenge of expressive VC lies in emotion prosody modeling. To address these challenges, this paper proposes a fully end-to-end expressive VC framework based on a conditional denoising diffusion probabilistic model (DDPM). We utilize speech units derived from self-supervised speech models as content conditioning, along with deep features extracted from speech emotion recognition and speaker verification systems to model emotional style and speaker identity. Objective and subjective evaluations show the effectiveness of our framework. Codes and samples are publicly available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
转换任何人的声音:利用条件扩散模型进行端到端表达式语音转换
表达式语音转换(VC)通过对说话者身份和情感风格进行联合转换,对情感丰富的说话者进行说话者身份转换。以往的方法依赖于声码器进行语音重构,这使得语音质量严重依赖于声码器的性能。表达式 VC 的一大挑战在于情感前体建模。为了应对这些挑战,本文提出了一种基于条件去噪扩散概率模型(DDPM)的完全端到表达式 VC 框架。我们利用从自我监督语音模型中提取的语音单元作为内容条件,同时利用从语音情感识别和说话人验证系统中提取的深度特征来模拟情感风格和说话人身份。客观和主观评价显示了我们框架的有效性。代码和样本可公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Benchmarking Sub-Genre Classification For Mainstage Dance Music PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification Evaluation of real-time transcriptions using end-to-end ASR models Machine Anomalous Sound Detection Using Spectral-temporal Modulation Representations Derived from Machine-specific Filterbanks Harmonic Reasoning in Large Language Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1