远距离语音识别的子空间滤波器最大峰度波束形成

K. Kumatani, J. McDonough, B. Raj
{"title":"远距离语音识别的子空间滤波器最大峰度波束形成","authors":"K. Kumatani, J. McDonough, B. Raj","doi":"10.1109/ASRU.2011.6163927","DOIUrl":null,"url":null,"abstract":"This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Maximum kurtosis beamforming with a subspace filter for distant speech recognition\",\"authors\":\"K. Kumatani, J. McDonough, B. Raj\",\"doi\":\"10.1109/ASRU.2011.6163927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.\",\"PeriodicalId\":338241,\"journal\":{\"name\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2011.6163927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2011.6163927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

提出了一种新的远距离语音识别波束形成方法。为了利用广义旁瓣对消器(GSC)有效估计最大峰度波束形成的有效权向量,考虑了主模子空间。我们在[1],[2],[3]中证明了基于最大峰度准则的波束形成方法可以消除混响和噪声影响,而不会遇到传统波束形成算法中的信号抵消问题。然而,MK波束形成算法依赖于数值优化算法,需要相对大量的数据来可靠地估计有效权向量。为了实现有效的估计,我们提出将子空间(特征空间)滤波器[4,§6.8]与主动权向量级联。子空间滤波器可以将阻塞矩阵的输出分解为方向信号和环境噪声分量。然后,将环境噪声分量平均并从波束形成器的输出中减去,从而得到可靠的估计并显著减少计算量。我们通过一组在真实环境中捕获的真实麦克风阵列数据的远程语音识别实验证明了该方法的有效性。我们的新波束形成算法在传统的波束形成技术中提供了最好的识别性能,单词错误率(WER)为5.3%,与近距离说话麦克风获得的4.2%的错误率相当。此外,与传统的MK波束形成器相比,该方法在自适应数据量较少的情况下取得了更好的识别性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Maximum kurtosis beamforming with a subspace filter for distant speech recognition
This paper presents a new beamforming method for distant speech recognition (DSR). The dominant mode subspace is considered in order to efficiently estimate the active weight vectors for maximum kurtosis (MK) beamforming with the generalized sidelobe canceler (GSC). We demonstrated in [1], [2], [3] that the beamforming method based on the maximum kurtosis criterion can remove reverberant and noise effects without signal cancellation encountered in the conventional beamforming algorithms. The MK beamforming algorithm, however, required a relatively large amount of data for reliably estimating the active weight vector because it relies on a numerical optimization algorithm. In order to achieve efficient estimation, we propose to cascade the subspace (eigenspace) filter [4, §6.8] with the active weight vector. The subspace filter can decompose the output of the blocking matrix into directional signals and ambient noise components. Then, the ambient noise components are averaged and would be subtracted from the beamformer's output, which leads to reliable estimation as well as significant computational reduction. We show the effectiveness of our method through a set of distant speech recognition experiments on real microphone array data captured in the real environment. Our new beamforming algorithm provided the best recognition performance among conventional beamforming techniques, a word error rate (WER) of 5.3 %, which is comparable to the WER of 4.2 % obtained with a close-talking microphone. Moreover, it achieved better recognition performance with a fewer amounts of adaptation data than the conventional MK beamformer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Applying feature bagging for more accurate and robust automated speaking assessment Towards choosing better primes for spoken dialog systems Accent level adjustment in bilingual Thai-English text-to-speech synthesis Fast speaker diarization using a high-level scripting language Evaluating prosodic features for automated scoring of non-native read speech
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1