Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation

Martin Strauss, Matteo Torcoli, B. Edler
{"title":"Improved Normalizing Flow-Based Speech Enhancement Using an all-Pole Gammatone Filterbank for Conditional Input Representation","authors":"Martin Strauss, Matteo Torcoli, B. Edler","doi":"10.1109/SLT54892.2023.10022898","DOIUrl":null,"url":null,"abstract":"Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Building on previous work, architectural modifications are proposed, along with an investigation of different conditional input representations. Despite being a common choice in related works, Mel-spectrograms demonstrate to be inadequate for the given scenario. Alternatively, a novel All-Pole Gammatone filterbank (APG) with high temporal resolution is proposed. Although computational evaluation metric results would suggest that state-of-the-art GAN-based methods perform best, a perceptual evaluation via a listening test indicates that the presented NF approach (based on time domain and APG) performs best, especially at lower SNRs. On average, APG outputs are rated as having good quality, which is unmatched by the other methods, including GAN.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT54892.2023.10022898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Building on previous work, architectural modifications are proposed, along with an investigation of different conditional input representations. Despite being a common choice in related works, Mel-spectrograms demonstrate to be inadequate for the given scenario. Alternatively, a novel All-Pole Gammatone filterbank (APG) with high temporal resolution is proposed. Although computational evaluation metric results would suggest that state-of-the-art GAN-based methods perform best, a perceptual evaluation via a listening test indicates that the presented NF approach (based on time domain and APG) performs best, especially at lower SNRs. On average, APG outputs are rated as having good quality, which is unmatched by the other methods, including GAN.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于条件输入表示的全极伽玛酮滤波器组的改进归一化流语音增强
语音增强的深度生成模型近年来受到越来越多的关注。最突出的例子是生成对抗网络(gan),而规范化流(NF)尽管具有潜力,但受到的关注较少。在先前工作的基础上,提出了架构修改,以及对不同条件输入表示的调查。尽管在相关工作中是一种常见的选择,梅尔谱图证明是不适合给定的场景。另外,提出了一种具有高时间分辨率的新型全极伽玛酮滤波器组(APG)。尽管计算评估度量结果表明基于gan的方法表现最好,但通过听力测试进行的感知评估表明,所提出的NF方法(基于时域和APG)表现最好,特别是在较低信噪比下。平均而言,APG输出被评为具有良好的质量,这是其他方法(包括GAN)无法比拟的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping The Clever Hans Effect in Voice Spoofing Detection A Multi-Modal Array of Interpretable Features to Evaluate Language and Speech Patterns in Different Neurological Disorders Unsupervised Domain Adaptation of Neural PLDA Using Segment Pairs for Speaker Verification Learning Accent Representation with Multi-Level VAE Towards Controllable Speech Synthesis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1