基于自适应窗口大小选择的频谱细化语音检测和基频估计

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) Pub Date : 2020-12-09 DOI:10.1109/ISSPIT51521.2020.9408968

N. Madhu, Mohammed Krini

{"title":"基于自适应窗口大小选择的频谱细化语音检测和基频估计","authors":"N. Madhu, Mohammed Krini","doi":"10.1109/ISSPIT51521.2020.9408968","DOIUrl":null,"url":null,"abstract":"Spectral refinement (SR) offers a computationally in-expensive means of generating a refined (higher resolution) signal spectrum by linearly combining the spectra of shorter, contiguous signal segments. The benefit of this method has previously been demonstrated on the problem of fundamental frequency (F0) estimation in speech processing – specifically for the improved estimation of very low F0. One drawback of SR is, however, the poorer detection of voicing onsets due to the Heisenberg-Gabor limit on time and frequency resolution. This may also lead to degraded performance in noisy conditions. Transitioning between long- and short-time windows for the spectral analysis may offer a good trade-off in these situations. This contribution presents a method to adaptively switch between short- and long-time windows (and, correspondingly, between the short-term and the refined spectrum) for voicing detection and F0 estimation. The improvements in voicing detection and F0 estimation due to this adaptive switching is conclusively demonstrated on audio signals in clean and corrupted conditions.","PeriodicalId":111385,"journal":{"name":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Spectral refinement with adaptive window-size selection for voicing detection and fundamental frequency estimation\",\"authors\":\"N. Madhu, Mohammed Krini\",\"doi\":\"10.1109/ISSPIT51521.2020.9408968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spectral refinement (SR) offers a computationally in-expensive means of generating a refined (higher resolution) signal spectrum by linearly combining the spectra of shorter, contiguous signal segments. The benefit of this method has previously been demonstrated on the problem of fundamental frequency (F0) estimation in speech processing – specifically for the improved estimation of very low F0. One drawback of SR is, however, the poorer detection of voicing onsets due to the Heisenberg-Gabor limit on time and frequency resolution. This may also lead to degraded performance in noisy conditions. Transitioning between long- and short-time windows for the spectral analysis may offer a good trade-off in these situations. This contribution presents a method to adaptively switch between short- and long-time windows (and, correspondingly, between the short-term and the refined spectrum) for voicing detection and F0 estimation. The improvements in voicing detection and F0 estimation due to this adaptive switching is conclusively demonstrated on audio signals in clean and corrupted conditions.\",\"PeriodicalId\":111385,\"journal\":{\"name\":\"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPIT51521.2020.9408968\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT51521.2020.9408968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

频谱细化(SR)提供了一种计算成本昂贵的方法，通过线性组合较短的连续信号段的频谱来生成精细(更高分辨率)的信号频谱。这种方法的好处已经在语音处理中的基频(F0)估计问题上得到了证明-特别是对于非常低的F0的改进估计。然而，SR的一个缺点是，由于时间和频率分辨率上的海森堡- gabor限制，对声音发作的检测较差。这也可能导致在噪声条件下性能下降。在这些情况下，在光谱分析的长时间窗口和短时间窗口之间进行转换可以提供一个很好的权衡。这一贡献提出了一种自适应切换短窗口和长窗口(相应地，在短期和精细频谱之间)的方法，用于语音检测和F0估计。由于这种自适应开关在声音检测和F0估计方面的改进最终证明了在干净和损坏条件下的音频信号。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Spectral refinement with adaptive window-size selection for voicing detection and fundamental frequency estimation

Spectral refinement (SR) offers a computationally in-expensive means of generating a refined (higher resolution) signal spectrum by linearly combining the spectra of shorter, contiguous signal segments. The benefit of this method has previously been demonstrated on the problem of fundamental frequency (F0) estimation in speech processing – specifically for the improved estimation of very low F0. One drawback of SR is, however, the poorer detection of voicing onsets due to the Heisenberg-Gabor limit on time and frequency resolution. This may also lead to degraded performance in noisy conditions. Transitioning between long- and short-time windows for the spectral analysis may offer a good trade-off in these situations. This contribution presents a method to adaptively switch between short- and long-time windows (and, correspondingly, between the short-term and the refined spectrum) for voicing detection and F0 estimation. The improvements in voicing detection and F0 estimation due to this adaptive switching is conclusively demonstrated on audio signals in clean and corrupted conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

自引率

0.00%

发文量