A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise

F. Soong, M. Sondhi
{"title":"A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise","authors":"F. Soong, M. Sondhi","doi":"10.1109/ICASSP.1987.1169899","DOIUrl":null,"url":null,"abstract":"The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a \"bandwidth broadened\" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1987.1169899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50

Abstract

The performance of a recognizer based on the Itakura spectral distortion measure deteriorates when speech signals are corrupted by noise, specially if it is not feasible to train and to test the recognizer under similar noise conditions. To alleviate this problem, we consider a more noise-resistant, weighted spectral distortion measure which weights the high SNR regions in frequency more than the low SNR regions. For the weighting function we choose a "bandwidth broadened" test spectrum; it weights spectral distortion more at the peaks than at the valleys of the spectrum. The amount of weighting is adapted according to an estimate of SNR, and becomes essentially constant in the noise-free case. The new measure has the dot product form and computaional efficiency of the Itakura distortion measure in the autocorrelation domain. It has been tested on a 10 speaker, isolated digit data base in a series of speaker independent speech recognition experiments. Additive white Gaussian noise was used to simulate different SNR conditions (from 5 dB to ∞ dB). The new measure performs as well as the original unweighted Itakura distortion measure at high SNR's, and significantly better at medium to low SNRs. At an SNR of 5 dB, the new measure achieves a digit error rate of 12.49% while the original Itakura distortion gives an error rate of 27.6%. The equivalent SNR improvement at low SNR's, is about 5 - 7 dB.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
频率加权Itakura频谱失真测量及其在噪声环境下语音识别中的应用
当语音信号被噪声破坏时,基于Itakura谱失真度量的识别器的性能会下降,特别是在类似噪声条件下无法训练和测试识别器时。为了缓解这一问题,我们考虑了一种更抗噪声的加权频谱失真测量方法,该方法在频率上对高信噪比区域的权重大于低信噪比区域。对于加权函数,我们选择“带宽加宽”的测试谱;它对光谱的波峰比波谷更重视光谱失真。加权量根据信噪比的估计进行调整,在无噪声情况下基本保持不变。新测度在自相关域具有Itakura失真测度的点积形式和计算效率。它已在10个说话人、孤立数字数据库中进行了一系列独立于说话人的语音识别实验。采用加性高斯白噪声模拟不同信噪比条件(从5 dB到∞dB)。新测量方法在高信噪比下的表现与原始的未加权Itakura失真测量方法一样好,在中低信噪比下表现明显更好。在信噪比为5 dB时,新方法的数字误差率为12.49%,而原始Itakura失真的误差率为27.6%。在低信噪比下,等效信噪比改善约为5 - 7db。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A high resolution data-adaptive time-frequency representation A fast prediction-error detector for estimating sparse-spike sequences Some applications of mathematical morphology to range imagery Parameter estimation using the autocorrelation of the discrete Fourier transform Array signal processing with interconnected Neuron-like elements
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1