利用离散余弦变换特征的语音增强技术FullSubNet+

IF 1.3 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS IET Networks Pub Date : 2022-10-14 DOI:10.1109/IET-ICETA56553.2022.9971683
Yu-sheng Tsao, Berlin Chen, J. Hung
{"title":"利用离散余弦变换特征的语音增强技术FullSubNet+","authors":"Yu-sheng Tsao, Berlin Chen, J. Hung","doi":"10.1109/IET-ICETA56553.2022.9971683","DOIUrl":null,"url":null,"abstract":"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.","PeriodicalId":46240,"journal":{"name":"IET Networks","volume":"29 1","pages":"1-2"},"PeriodicalIF":1.3000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+\",\"authors\":\"Yu-sheng Tsao, Berlin Chen, J. Hung\",\"doi\":\"10.1109/IET-ICETA56553.2022.9971683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.\",\"PeriodicalId\":46240,\"journal\":{\"name\":\"IET Networks\",\"volume\":\"29 1\",\"pages\":\"1-2\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IET-ICETA56553.2022.9971683\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IET-ICETA56553.2022.9971683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

基于深度学习的高效技术FullSubNet+采用全带和子带融合模型来完成语音增强任务。FullSubNet+利用短时幅度谱图、复值谱图的实部和虚部来学习主要由多尺度时敏信道注意(MulCA)模块和堆叠时间卷积网络(TCN)模块组成的深度神经网络。为了更简单地捕获输入时域信号的相位信息,我们建议使用基于短时dct的频谱图作为替代实谱图和虚谱图的输入源来学习FullSubNet+框架。VoiceBank-DEMAND任务的初步实验表明,与原始的FullSubNet+安排相比,在FullSubNet+中利用STDCT频谱图分别在PESQ和STOI度量分数方面获得了更高的客观语音质量和可理解性。此外,STDCT-wise FullSubNet+的实时因子RTF (real-time factor)为0.229,低于原始FullSubNet+的RTF 0.260。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+
The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IET Networks
IET Networks COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
5.00
自引率
0.00%
发文量
41
审稿时长
33 weeks
期刊介绍: IET Networks covers the fundamental developments and advancing methodologies to achieve higher performance, optimized and dependable future networks. IET Networks is particularly interested in new ideas and superior solutions to the known and arising technological development bottlenecks at all levels of networking such as topologies, protocols, routing, relaying and resource-allocation for more efficient and more reliable provision of network services. Topics include, but are not limited to: Network Architecture, Design and Planning, Network Protocol, Software, Analysis, Simulation and Experiment, Network Technologies, Applications and Services, Network Security, Operation and Management.
期刊最新文献
Smart forest monitoring: A novel Internet of Things framework with shortest path routing for sustainable environmental management Analysing the performance of AODV, OLSR, and DSDV routing protocols in VANET based on the ECIE method An unsupervised approach for the detection of zero-day distributed denial of service attacks in Internet of Things networks An effective ensemble electricity theft detection algorithm for smart grid Hard-state Protocol Independent Multicast—Source-Specific Multicast (HPIM-SSM)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1