研究生成式语音增强的训练目标

arXiv - EE - Audio and Speech Processing Pub Date : 2024-09-16 DOI:arxiv-2409.10753

Julius Richter, Danilo de Oliveira, Timo Gerkmann

{"title":"研究生成式语音增强的训练目标","authors":"Julius Richter, Danilo de Oliveira, Timo Gerkmann","doi":"arxiv-2409.10753","DOIUrl":null,"url":null,"abstract":"Generative speech enhancement has recently shown promising advancements in\nimproving speech quality in noisy environments. Multiple diffusion-based\nframeworks exist, each employing distinct training objectives and learning\ntechniques. This paper aims at explaining the differences between these\nframeworks by focusing our investigation on score-based generative models and\nSchr\\\"odinger bridge. We conduct a series of comprehensive experiments to\ncompare their performance and highlight differing training behaviors.\nFurthermore, we propose a novel perceptual loss function tailored for the\nSchr\\\"odinger bridge framework, demonstrating enhanced performance and improved\nperceptual quality of the enhanced speech signals. All experimental code and\npre-trained models are publicly available to facilitate further research and\ndevelopment in this.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigating Training Objectives for Generative Speech Enhancement\",\"authors\":\"Julius Richter, Danilo de Oliveira, Timo Gerkmann\",\"doi\":\"arxiv-2409.10753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative speech enhancement has recently shown promising advancements in\\nimproving speech quality in noisy environments. Multiple diffusion-based\\nframeworks exist, each employing distinct training objectives and learning\\ntechniques. This paper aims at explaining the differences between these\\nframeworks by focusing our investigation on score-based generative models and\\nSchr\\\\\\\"odinger bridge. We conduct a series of comprehensive experiments to\\ncompare their performance and highlight differing training behaviors.\\nFurthermore, we propose a novel perceptual loss function tailored for the\\nSchr\\\\\\\"odinger bridge framework, demonstrating enhanced performance and improved\\nperceptual quality of the enhanced speech signals. All experimental code and\\npre-trained models are publicly available to facilitate further research and\\ndevelopment in this.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"72 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10753\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近，生成语音增强技术在改善嘈杂环境下的语音质量方面取得了可喜的进步。目前存在多种基于扩散的框架，每种框架都采用了不同的训练目标和学习技术。本文旨在通过重点研究基于分数的生成模型和薛定谔桥来解释这些框架之间的差异。此外，我们还提出了一种为薛定谔桥框架量身定制的新型感知损失函数，证明了增强语音信号的性能和感知质量。所有实验代码和预先训练的模型都是公开的，以促进这方面的进一步研究和开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Investigating Training Objectives for Generative Speech Enhancement

Generative speech enhancement has recently shown promising advancements in improving speech quality in noisy environments. Multiple diffusion-based frameworks exist, each employing distinct training objectives and learning techniques. This paper aims at explaining the differences between these frameworks by focusing our investigation on score-based generative models and Schr\"odinger bridge. We conduct a series of comprehensive experiments to compare their performance and highlight differing training behaviors. Furthermore, we propose a novel perceptual loss function tailored for the Schr\"odinger bridge framework, demonstrating enhanced performance and improved perceptual quality of the enhanced speech signals. All experimental code and pre-trained models are publicly available to facilitate further research and development in this.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - EE - Audio and Speech Processing

自引率

0.00%

发文量