独立说话人语音识别中平滑群延迟频谱距离测度的评价

Taizo Umezaki Member, Harald Singer Member, Fumitada Itakura Member
{"title":"独立说话人语音识别中平滑群延迟频谱距离测度的评价","authors":"Taizo Umezaki Member, Harald Singer Member, Fumitada Itakura Member","doi":"10.1002/ECJC.4430741005","DOIUrl":null,"url":null,"abstract":"The smoothed group delay spectrum distance (SGDS) measure is evaluated in speaker-independent recognition experiments. First, the appropriate level of smoothing of the group delay spectrum (GDS) is investigated by adding noise, etc., to the input speech. Then a comparison with the speaker-dependent case is made. An experiment is reported in which, for low amplitude parts of speech (e.g., unvoiced speech), the standard (LPC) distance measure is used in the interframe distance calculation instead of the SGDS distance measure. This method prevents a loss of recognition accuracy due to too strong an emphasis on certain spectral elements and a consistently high recognition accuracy can be achieved. \n \n \n \nFinally, evaluate the SGDS distance measure is evaluated where the GDS is represented in the spectral domain as a discrete Fourier transform (DFT) of the LPC coefficients. In comparison to the SGDS which was calculated by weighting the LPC cepstrum co-efficients, computation time and memory space can be reduced without loss of recognition accuracy. Furthermore, a low bit quantization of the GDS is reported and a high recognition rate is achieved with only 32 bits per frame.","PeriodicalId":100407,"journal":{"name":"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2007-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of smoothed group delay spectrum distance measure in speaker-independent speech recognition\",\"authors\":\"Taizo Umezaki Member, Harald Singer Member, Fumitada Itakura Member\",\"doi\":\"10.1002/ECJC.4430741005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The smoothed group delay spectrum distance (SGDS) measure is evaluated in speaker-independent recognition experiments. First, the appropriate level of smoothing of the group delay spectrum (GDS) is investigated by adding noise, etc., to the input speech. Then a comparison with the speaker-dependent case is made. An experiment is reported in which, for low amplitude parts of speech (e.g., unvoiced speech), the standard (LPC) distance measure is used in the interframe distance calculation instead of the SGDS distance measure. This method prevents a loss of recognition accuracy due to too strong an emphasis on certain spectral elements and a consistently high recognition accuracy can be achieved. \\n \\n \\n \\nFinally, evaluate the SGDS distance measure is evaluated where the GDS is represented in the spectral domain as a discrete Fourier transform (DFT) of the LPC coefficients. In comparison to the SGDS which was calculated by weighting the LPC cepstrum co-efficients, computation time and memory space can be reduced without loss of recognition accuracy. Furthermore, a low bit quantization of the GDS is reported and a high recognition rate is achieved with only 32 bits per frame.\",\"PeriodicalId\":100407,\"journal\":{\"name\":\"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)\",\"volume\":\"21 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/ECJC.4430741005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics and Communications in Japan (Part III: Fundamental Electronic Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/ECJC.4430741005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在与说话人无关的识别实验中,对平滑组延迟频谱距离(SGDS)测度进行了评价。首先,通过在输入语音中加入噪声等因素来研究群延迟谱(GDS)的适当平滑程度。然后与依赖说话人的情况进行了比较。本文报道了一项实验,对低振幅语音部分(如未发音语音),在帧间距离计算中使用标准(LPC)距离度量代替SGDS距离度量。该方法防止了由于过于强调某些光谱元素而导致的识别精度损失,并且可以实现始终如一的高识别精度。最后,评估SGDS距离度量,其中GDS在谱域中表示为LPC系数的离散傅里叶变换(DFT)。与加权LPC倒谱系数计算SGDS相比,在不损失识别精度的前提下,减少了计算时间和存储空间。此外,本文还报道了GDS的低比特量化和高识别率,每帧只有32比特。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Evaluation of smoothed group delay spectrum distance measure in speaker-independent speech recognition
The smoothed group delay spectrum distance (SGDS) measure is evaluated in speaker-independent recognition experiments. First, the appropriate level of smoothing of the group delay spectrum (GDS) is investigated by adding noise, etc., to the input speech. Then a comparison with the speaker-dependent case is made. An experiment is reported in which, for low amplitude parts of speech (e.g., unvoiced speech), the standard (LPC) distance measure is used in the interframe distance calculation instead of the SGDS distance measure. This method prevents a loss of recognition accuracy due to too strong an emphasis on certain spectral elements and a consistently high recognition accuracy can be achieved. Finally, evaluate the SGDS distance measure is evaluated where the GDS is represented in the spectral domain as a discrete Fourier transform (DFT) of the LPC coefficients. In comparison to the SGDS which was calculated by weighting the LPC cepstrum co-efficients, computation time and memory space can be reduced without loss of recognition accuracy. Furthermore, a low bit quantization of the GDS is reported and a high recognition rate is achieved with only 32 bits per frame.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Toward systematic generation of 3COL instances based on minimal unsolvable structures Two computational algorithms for deriving phase equations: Equivalence and some cautions A data‐driven processor for alleviating bottlenecks of sequential programs and maintaining multiprocessing capability Robust and adaptive merge of multiple range images with photometric attribute Autostereoscopic visualization of volume data using computer‐generated holograms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1