Leveraging the Objective Intelligibility and Noise Estimation to Improve Conformer-Based MetricGAN

Chia Dai, Wan-Ling Zeng, Jia-Xuan Zeng, J. Hung
{"title":"Leveraging the Objective Intelligibility and Noise Estimation to Improve Conformer-Based MetricGAN","authors":"Chia Dai, Wan-Ling Zeng, Jia-Xuan Zeng, J. Hung","doi":"10.1109/ICASI57738.2023.10179495","DOIUrl":null,"url":null,"abstract":"Conformer-based MetricGAN (CMGAN) is a deep neural network (DNN)-based speech enhancement (SE) method that uses time-frequency (TF) domain features to learn a novel conformer-wise generative network, and it has demonstrated excellent SE performance in terms of various perceptual evaluation metrics.In this study, we propose to revise CMGAN along three directions. To begin, we incorporate phone-fortified perceptual loss (PFPL) into its loss function. The PFPL is calculated using latent representations of speech from the wav2vec module. With PFPL as part of the loss function can effectively use perceptual and linguistic speech information to direct CMGAN model training. Next, we revise the discriminator output by adding the STOI values. The original discriminator is trained to estimate the enhanced PESQ score by taking both clean and enhanced spectrum as inputs as well as the associated PESQ label. In other words, the initial discriminator only takes into account the PESQ score. By further considering STOI, we expect to improve the discriminator. Finally, we add noise label estimation to the entire CMGAN framework. The original CMGAN only calculates the disparity between the estimated value provided by the model and the clean target with clean labels. Instead, we further take into account noise estimation loss, which can show the discrepancy between the predicted noise and the noise label.The Voicebank-Demand dataset is used for the evaluation experiments. According to the experimental results, the revised CMGAN outperforms the original by gaining greater scores on objective perceptual metrics including PESQ and STOI. As a result, we confirm the success of the presented revisions in CMGAN.","PeriodicalId":281254,"journal":{"name":"2023 9th International Conference on Applied System Innovation (ICASI)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 9th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI57738.2023.10179495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Conformer-based MetricGAN (CMGAN) is a deep neural network (DNN)-based speech enhancement (SE) method that uses time-frequency (TF) domain features to learn a novel conformer-wise generative network, and it has demonstrated excellent SE performance in terms of various perceptual evaluation metrics.In this study, we propose to revise CMGAN along three directions. To begin, we incorporate phone-fortified perceptual loss (PFPL) into its loss function. The PFPL is calculated using latent representations of speech from the wav2vec module. With PFPL as part of the loss function can effectively use perceptual and linguistic speech information to direct CMGAN model training. Next, we revise the discriminator output by adding the STOI values. The original discriminator is trained to estimate the enhanced PESQ score by taking both clean and enhanced spectrum as inputs as well as the associated PESQ label. In other words, the initial discriminator only takes into account the PESQ score. By further considering STOI, we expect to improve the discriminator. Finally, we add noise label estimation to the entire CMGAN framework. The original CMGAN only calculates the disparity between the estimated value provided by the model and the clean target with clean labels. Instead, we further take into account noise estimation loss, which can show the discrepancy between the predicted noise and the noise label.The Voicebank-Demand dataset is used for the evaluation experiments. According to the experimental results, the revised CMGAN outperforms the original by gaining greater scores on objective perceptual metrics including PESQ and STOI. As a result, we confirm the success of the presented revisions in CMGAN.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用客观可理解性和噪声估计改进基于一致性的度量gan
基于一致性的MetricGAN (CMGAN)是一种基于深度神经网络(DNN)的语音增强(SE)方法,它利用时频(TF)域特征来学习一种新颖的基于一致性的生成网络,并在各种感知评价指标方面表现出优异的SE性能。在本研究中,我们建议从三个方向修改CMGAN。首先,我们将手机强化感知损失(PFPL)纳入其损失函数。使用来自wav2vec模块的语音的潜在表示来计算PFPL。将PFPL作为损失函数的一部分,可以有效地利用感知和语言语音信息指导CMGAN模型的训练。接下来,我们通过添加STOI值来修改鉴别器输出。原始鉴别器被训练来估计增强的PESQ分数,将清洁和增强的频谱作为输入,以及相关的PESQ标签。换句话说,初始鉴别器只考虑PESQ分数。通过进一步考虑STOI,我们期望改进鉴别器。最后,我们将噪声标签估计添加到整个CMGAN框架中。原来的CMGAN只计算模型提供的估计值与带有干净标签的干净目标之间的差值。相反,我们进一步考虑了噪声估计损失,它可以显示预测噪声与噪声标签之间的差异。评估实验使用语音银行-需求数据集。实验结果表明,改进后的CMGAN在PESQ和STOI等客观感知指标上的得分高于原算法。因此,我们确认在CMGAN中提出的修订是成功的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Intelligent Detection of Disinformation Based on Chronological and Spatial Topologies Cluster based Indexing for Spatial Analysis on Read-only Database Straight-line Generation Approach using Deep Learning for Mobile Robot Guidance in Lettuce Fields Leveraging the Objective Intelligibility and Noise Estimation to Improve Conformer-Based MetricGAN Analysis of Eye-tracking System Based on Diffractive Waveguide
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1