Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF

Md. Imran Hossain, Md. Abdur Rahim, Md. Najmul Hossain
{"title":"Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF","authors":"Md. Imran Hossain, Md. Abdur Rahim, Md. Najmul Hossain","doi":"10.33166/aetic.2024.01.001","DOIUrl":null,"url":null,"abstract":"Speech is essential to human communication; therefore, distinguishing it from noise is crucial. Speech separation becomes challenging in real-world circumstances with background noise and overlapping speech. Moreover, the speech separation using short-term Fourier transform (STFT) and discrete wavelet transform (DWT) addresses time and frequency resolution and time-variation issues, respectively. To solve the above issues, a new speech separation technique is presented based on the double-density dual-tree complex wavelet transform (DDDTCWT) and sparse non-negative matrix factorization (SNMF). The signal is separated into high-pass and low-pass frequency components using DDDTCWT wavelet decomposition. For this analysis, we only considered the low-pass frequency components and zeroed out the high-pass ones. Subsequently, the STFT is then applied to each sub-band signal to generate a complex spectrogram. Therefore, we have used SNMF to factorize the joint form of magnitude and the absolute value of real and imaginary (RI) components that decompose the basis and weight matrices. Most researchers enhance the magnitude spectra only, ignore the phase spectra, and estimate the separated speech using noisy phase. As a result, some noise components are present in the estimated speech results. We are dealing with the signal's magnitude as well as the RI components and estimating the phase of the RI parts. Finally, separated speech signals can be achieved using the inverse STFT (ISTFT) and the inverse DDDTCWT (IDDDTCWT). Separation performance is improved for estimating the phase component and the shift-invariant, better direction selectivity, and scheme freedom properties of DDDTCWT. The speech separation efficiency of the proposed algorithm outperforms performance by 6.53–8.17 dB SDR gain, 7.37-9.87 dB SAR gain, and 14.92–17.21 dB SIR gain compared to the NMF method with masking on the TIMIT dataset.","PeriodicalId":36440,"journal":{"name":"Annals of Emerging Technologies in Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Emerging Technologies in Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33166/aetic.2024.01.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Speech is essential to human communication; therefore, distinguishing it from noise is crucial. Speech separation becomes challenging in real-world circumstances with background noise and overlapping speech. Moreover, the speech separation using short-term Fourier transform (STFT) and discrete wavelet transform (DWT) addresses time and frequency resolution and time-variation issues, respectively. To solve the above issues, a new speech separation technique is presented based on the double-density dual-tree complex wavelet transform (DDDTCWT) and sparse non-negative matrix factorization (SNMF). The signal is separated into high-pass and low-pass frequency components using DDDTCWT wavelet decomposition. For this analysis, we only considered the low-pass frequency components and zeroed out the high-pass ones. Subsequently, the STFT is then applied to each sub-band signal to generate a complex spectrogram. Therefore, we have used SNMF to factorize the joint form of magnitude and the absolute value of real and imaginary (RI) components that decompose the basis and weight matrices. Most researchers enhance the magnitude spectra only, ignore the phase spectra, and estimate the separated speech using noisy phase. As a result, some noise components are present in the estimated speech results. We are dealing with the signal's magnitude as well as the RI components and estimating the phase of the RI parts. Finally, separated speech signals can be achieved using the inverse STFT (ISTFT) and the inverse DDDTCWT (IDDDTCWT). Separation performance is improved for estimating the phase component and the shift-invariant, better direction selectivity, and scheme freedom properties of DDDTCWT. The speech separation efficiency of the proposed algorithm outperforms performance by 6.53–8.17 dB SDR gain, 7.37-9.87 dB SAR gain, and 14.92–17.21 dB SIR gain compared to the NMF method with masking on the TIMIT dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于双密度双树 CWT 和 SNMF 的单通道语音分离技术
语音是人类交流的基本要素,因此将语音与噪音区分开来至关重要。在现实世界中,由于背景噪声和语音重叠,语音分离变得极具挑战性。此外,使用短期傅里叶变换(STFT)和离散小波变换(DWT)进行语音分离时,需要分别解决时间和频率分辨率以及时变问题。为解决上述问题,本文提出了一种基于双密度双树复小波变换(DDDTCWT)和稀疏非负矩阵因式分解(SNMF)的新型语音分离技术。通过 DDDTCWT 小波分解,信号被分离成高通和低通频率分量。在本分析中,我们只考虑低通频率分量,而将高通频率分量清零。随后,STFT 应用于每个子带信号,生成复频谱图。因此,我们使用 SNMF 对分解基矩阵和权重矩阵的幅值和实部与虚部(RI)分量的绝对值的联合形式进行因式分解。大多数研究人员只增强了幅度频谱,忽略了相位频谱,并使用噪声相位来估计分离的语音。因此,在估计的语音结果中会出现一些噪声成分。我们既要处理信号的幅度,也要处理 RI 分量,并估算 RI 部分的相位。最后,可以使用反 STFT(ISTFT)和反 DDDTCWT(IDDTCWT)来分离语音信号。在估计相位分量和 DDDTCWT 的移位不变性、更好的方向选择性和方案自由度特性时,分离性能得到了提高。在 TIMIT 数据集上,与带掩码的 NMF 方法相比,所提算法的语音分离效率提高了 6.53-8.17 dB SDR 增益、7.37-9.87 dB SAR 增益和 14.92-17.21 dB SIR 增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Emerging Technologies in Computing
Annals of Emerging Technologies in Computing Computer Science-Computer Science (all)
CiteScore
3.50
自引率
0.00%
发文量
26
期刊最新文献
The Proposal of Countermeasures for DeepFake Voices on Social Media Considering Waveform and Text Embedding Lightweight Model for Occlusion Removal from Face Images A Torpor-based Enhanced Security Model for CSMA/CA Protocol in Wireless Networks Enhancing Robot Navigation Efficiency Using Cellular Automata with Active Cells Wildfire Prediction in the United States Using Time Series Forecasting Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1