Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition

Cong-Thanh Do, M. Taghizadeh, Philip N. Garner
{"title":"Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition","authors":"Cong-Thanh Do, M. Taghizadeh, Philip N. Garner","doi":"10.1109/SLT.2012.6424211","DOIUrl":null,"url":null,"abstract":"This paper investigates the combination of cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. Testing speech signals are recorded by a circular microphone array and are subsequently processed with superdirective beamforming and McCowan post-filtering. Training speech signals, from the multichannel overlapping Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which is inspired from the speech processing strategy in cochlear implants, is applied on the training and testing speech signals. Cepstral normalization, including cepstral mean and variance normalization (CMN and CVN), are applied on the training and testing cepstra. Experiments show that implementing either cepstral normalization or cochlear implant-like speech processing helps in reducing the WERs of microphone array-based speech recognition. Combining cepstral normalization and cochlear implant-like speech processing reduces further the WERs, when there is overlapping speech. Train/test mismatches are measured using the Kullback-Leibler divergence (KLD), between the global probability density functions (PDFs) of training and testing cepstral vectors. This measure reveals a train/test mismatch reduction when either cepstral normalization or cochlear implant-like speech processing is used. It reveals also that combining these two processing reduces further the train/test mismatches as well as the WERs.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

This paper investigates the combination of cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. Testing speech signals are recorded by a circular microphone array and are subsequently processed with superdirective beamforming and McCowan post-filtering. Training speech signals, from the multichannel overlapping Number corpus (MONC), are clean and not overlapping. Cochlear implant-like speech processing, which is inspired from the speech processing strategy in cochlear implants, is applied on the training and testing speech signals. Cepstral normalization, including cepstral mean and variance normalization (CMN and CVN), are applied on the training and testing cepstra. Experiments show that implementing either cepstral normalization or cochlear implant-like speech processing helps in reducing the WERs of microphone array-based speech recognition. Combining cepstral normalization and cochlear implant-like speech processing reduces further the WERs, when there is overlapping speech. Train/test mismatches are measured using the Kullback-Leibler divergence (KLD), between the global probability density functions (PDFs) of training and testing cepstral vectors. This measure reveals a train/test mismatch reduction when either cepstral normalization or cochlear implant-like speech processing is used. It reveals also that combining these two processing reduces further the train/test mismatches as well as the WERs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合倒谱归一化与人工耳蜗类语音处理的麦克风阵列语音识别
本文研究了将倒谱归一化与类人工耳蜗语音处理相结合的方法用于基于麦克风阵列的语音识别。测试语音信号由圆形麦克风阵列记录,随后进行超指令波束形成和McCowan后滤波处理。训练语音信号,从多通道重叠数语料库(MONC),是干净的,不重叠。仿人工耳蜗语音处理受人工耳蜗语音处理策略的启发,应用于语音信号的训练和测试。倒谱归一化包括倒谱均值和方差归一化(CMN和CVN),用于倒谱的训练和测试。实验表明,采用倒谱归一化或类似人工耳蜗的语音处理方法都有助于降低基于麦克风阵列的语音识别的功率。当存在重叠语音时,倒谱归一化与人工耳蜗样语音处理相结合可进一步降低脑电损伤。训练/测试不匹配使用Kullback-Leibler散度(KLD)来测量,在训练和测试倒谱向量的全局概率密度函数(pdf)之间。当使用倒谱归一化或人工耳蜗类语音处理时,该测量结果显示训练/测试不匹配减少。结果还表明,结合这两种处理方法可以进一步减少训练/测试不匹配以及wwer。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning Two-layer mutually reinforced random walk for improved multi-party meeting summarization Train&align: A new online tool for automatic phonetic alignment Automatic detection and correction of syntax-based prosody annotation errors Word segmentation through cross-lingual word-to-phoneme alignment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1