The preliminary study of improving the DPTNet speech enhancement system by adjusting its encoder and loss function

Yu-Yu Hsiao, Ming-Hsuan Wu, Kuan-Yu Tsai, J. Hung
{"title":"The preliminary study of improving the DPTNet speech enhancement system by adjusting its encoder and loss function","authors":"Yu-Yu Hsiao, Ming-Hsuan Wu, Kuan-Yu Tsai, J. Hung","doi":"10.1109/ICASI55125.2022.9774458","DOIUrl":null,"url":null,"abstract":"This study analyzes the celebrated speech enhancement method, Dual-Path Transformer Network (DPTNet), attempting to revise the respective arrangement to get superior performance.The DPTNet consists of three parts: encoder, separation layer and decoder. The encoder creates features from input speech signals. The separation layer mainly consists of two improved Transformers to perform mask-wise speech and noise separation on encoded features. Finally, the decoder reconstructs the speech signal from the masked features.We modify the DPTNet in two parts. First, we concatenate time- and frequency-domain features and then send them into a bottleneck block to create a compact feature representation. Second, we test several widely used loss functions at the terminal of the decoder and find that the hybrid loss used in another SE deep network, DEMUCS, behaves the best.To sum up, the new arrangement mentioned above provides the test set in the VoiceBank-DEMAND task with 2.85 in PESQ and 0.945 in STOI, which represents the speech quality and intelligibility, respectively.","PeriodicalId":190229,"journal":{"name":"2022 8th International Conference on Applied System Innovation (ICASI)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI55125.2022.9774458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study analyzes the celebrated speech enhancement method, Dual-Path Transformer Network (DPTNet), attempting to revise the respective arrangement to get superior performance.The DPTNet consists of three parts: encoder, separation layer and decoder. The encoder creates features from input speech signals. The separation layer mainly consists of two improved Transformers to perform mask-wise speech and noise separation on encoded features. Finally, the decoder reconstructs the speech signal from the masked features.We modify the DPTNet in two parts. First, we concatenate time- and frequency-domain features and then send them into a bottleneck block to create a compact feature representation. Second, we test several widely used loss functions at the terminal of the decoder and find that the hybrid loss used in another SE deep network, DEMUCS, behaves the best.To sum up, the new arrangement mentioned above provides the test set in the VoiceBank-DEMAND task with 2.85 in PESQ and 0.945 in STOI, which represents the speech quality and intelligibility, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过调整DPTNet的编码器和损失函数来改进DPTNet语音增强系统的初步研究
本研究分析了著名的语音增强方法,双路变压器网络(DPTNet),试图修改各自的安排,以获得更好的性能。DPTNet由三部分组成:编码器、分离层和解码器。编码器从输入的语音信号中创建特征。分离层主要由两个改进的transformer组成,用于对编码特征进行基于掩模的语音和噪声分离。最后,解码器根据被屏蔽的特征重构语音信号。我们分两部分修改DPTNet。首先,我们将时域和频域特征连接起来,然后将它们发送到瓶颈块中以创建紧凑的特征表示。其次,我们在解码器的终端测试了几种广泛使用的损失函数,发现另一种SE深度网络DEMUCS中使用的混合损失表现最好。综上所述,上述新安排为VoiceBank-DEMAND任务中的测试集提供了PESQ为2.85、STOI为0.945的测试集,分别代表语音质量和可理解性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Experiments on Mechanical Behavior and Electrical Conductivity of Au/Ni-Coated PMMA-Core Composite Particle During Micro Compression Testing Application of Augmented Reality for Aviation Equipment Inspection and Maintenance Training Analysis of electrical properties in MOS structure with a low surface roughness Al2O3-doped ZnO film as gate oxide A Study on Missing Data Imputation Methods for Improving Hourly Solar Dataset Interactive Visualization System of 3-D digital Elevation Model For Mountain Collapse Simulation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1