抗混叠滤波对基于hmm合成器语音质量的影响

Y. Shiga
{"title":"抗混叠滤波对基于hmm合成器语音质量的影响","authors":"Y. Shiga","doi":"10.1109/ICASSP.2012.6288924","DOIUrl":null,"url":null,"abstract":"This paper investigates how the quality of speech produced through statistical parametric synthesis is affected by anti-aliasing filtering, i.e., low-pass filtering that is applied prior to (down-) sampling prerecorded speech at a desired rate. It has empirically been known that the frequency response of such anti-aliasing filters influences the quality of speech synthesized to a considerable degree. For the purpose of understanding such influence more clearly, in this paper we examine the spectral aspects of speech involved in the processes of HMM training and synthesis. We then propose a technique of feature extraction that can avoid producing the roll-off feature of the frequency response near the Nyquist frequency, which is found to be the major cause of speech quality degradation resulting from anti-aliasing filtering. In the technique, the spectrum is first computed from speech at a sampling rate higher than the desired rate, then it is truncated so that its frequency range above the target Nyquist frequency is discarded, and finally the truncated spectrum is converted directly into the cepstrum. Listening test results show that the proposed technique enables training HMMs efficiently with a limited number of model parameters and effectively with less artifacts in the speech synthesized at a desired sampling rate.","PeriodicalId":6443,"journal":{"name":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Effect of anti-aliasing filtering on the quality of speech from an HMM-based synthesizer\",\"authors\":\"Y. Shiga\",\"doi\":\"10.1109/ICASSP.2012.6288924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates how the quality of speech produced through statistical parametric synthesis is affected by anti-aliasing filtering, i.e., low-pass filtering that is applied prior to (down-) sampling prerecorded speech at a desired rate. It has empirically been known that the frequency response of such anti-aliasing filters influences the quality of speech synthesized to a considerable degree. For the purpose of understanding such influence more clearly, in this paper we examine the spectral aspects of speech involved in the processes of HMM training and synthesis. We then propose a technique of feature extraction that can avoid producing the roll-off feature of the frequency response near the Nyquist frequency, which is found to be the major cause of speech quality degradation resulting from anti-aliasing filtering. In the technique, the spectrum is first computed from speech at a sampling rate higher than the desired rate, then it is truncated so that its frequency range above the target Nyquist frequency is discarded, and finally the truncated spectrum is converted directly into the cepstrum. Listening test results show that the proposed technique enables training HMMs efficiently with a limited number of model parameters and effectively with less artifacts in the speech synthesized at a desired sampling rate.\",\"PeriodicalId\":6443,\"journal\":{\"name\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2012.6288924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2012.6288924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了通过统计参数合成产生的语音质量如何受到抗混叠滤波的影响,即在以期望速率对预先录制的语音进行(下)采样之前应用的低通滤波。经验表明,这种抗混叠滤波器的频率响应在很大程度上影响了合成语音的质量。为了更清楚地理解这种影响,本文研究了HMM训练和合成过程中涉及的语音频谱方面。然后,我们提出了一种特征提取技术,可以避免在奈奎斯特频率附近产生频率响应的滚降特征,这是抗混叠滤波导致语音质量下降的主要原因。在该技术中,首先以高于期望速率的采样率从语音中计算频谱,然后对其进行截断,使其高于目标奈奎斯特频率的频率范围被丢弃,最后将截断的频谱直接转换为倒谱。听力测试结果表明,该方法可以在有限的模型参数下有效地训练hmm,并且在所需的采样率下合成语音中的伪影较少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Effect of anti-aliasing filtering on the quality of speech from an HMM-based synthesizer
This paper investigates how the quality of speech produced through statistical parametric synthesis is affected by anti-aliasing filtering, i.e., low-pass filtering that is applied prior to (down-) sampling prerecorded speech at a desired rate. It has empirically been known that the frequency response of such anti-aliasing filters influences the quality of speech synthesized to a considerable degree. For the purpose of understanding such influence more clearly, in this paper we examine the spectral aspects of speech involved in the processes of HMM training and synthesis. We then propose a technique of feature extraction that can avoid producing the roll-off feature of the frequency response near the Nyquist frequency, which is found to be the major cause of speech quality degradation resulting from anti-aliasing filtering. In the technique, the spectrum is first computed from speech at a sampling rate higher than the desired rate, then it is truncated so that its frequency range above the target Nyquist frequency is discarded, and finally the truncated spectrum is converted directly into the cepstrum. Listening test results show that the proposed technique enables training HMMs efficiently with a limited number of model parameters and effectively with less artifacts in the speech synthesized at a desired sampling rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Scalable Multilevel Quantization for Distributed Detection Linear Model-Based Intra Prediction in VVC Test Model Practical Concentric Open Sphere Cardioid Microphone Array Design for Higher Order Sound Field Capture Embedding Physical Augmentation and Wavelet Scattering Transform to Generative Adversarial Networks for Audio Classification with Limited Training Resources Improving ASR Robustness to Perturbed Speech Using Cycle-consistent Generative Adversarial Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1