首页 > 最新文献

2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

英文 中文
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition 基于Kullback-Leibler散度的非母语语音识别声学建模
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163956
David Imseng, Ramya Rasipuram, M. Magimai.-Doss
One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multi-accented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.
非母语语音识别面临的主要挑战之一是如何在有限的训练数据下处理多口音非母语语音中的声学变异性。在本文中,我们研究了一种通过使用基于Kullback-Leibler散度的隐马尔可夫模型(KL-HMM)来解决这一挑战的方法。更准确地说,多重音语音中的声学变异性是通过使用多语言音素后验概率来处理的,由辅助数据训练的多层感知器估计,作为KL-HMM系统的输入特征。在训练数据有限的情况下,我们利用KL-HMM系统参数较少的优势,建立了更好的声学模型。在HIWIRE语料库上,该方法在149分钟的训练数据下产生1.9%的词错误率(WER),在2分钟的训练数据下产生5.5%的词错误率。
{"title":"Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition","authors":"David Imseng, Ramya Rasipuram, M. Magimai.-Doss","doi":"10.1109/ASRU.2011.6163956","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163956","url":null,"abstract":"One of the main challenge in non-native speech recognition is how to handle acoustic variability present in multi-accented non-native speech with limited amount of training data. In this paper, we investigate an approach that addresses this challenge by using Kullback-Leibler divergence based hidden Markov models (KL-HMM). More precisely, the acoustic variability in the multi-accented speech is handled by using multilingual phoneme posterior probabilities, estimated by a multilayer perceptron trained on auxiliary data, as input feature for the KL-HMM system. With limited training data, we then build better acoustic models by exploiting the advantage that the KL-HMM system has fewer number of parameters. On HIWIRE corpus, the proposed approach yields a performance of 1.9% word error rate (WER) with 149 minutes of training data and a performance of 5.5% WER with 2 minutes of training data.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125344077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Strategies for training large scale neural network language models 大规模神经网络语言模型的训练策略
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163930
Tomas Mikolov, Anoop Deoras, Daniel Povey, L. Burget, J. Černocký
We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.
我们描述了如何在大数据集上有效地训练基于神经网络的语言模型。根据训练数据的相关性对训练数据进行排序,训练过程收敛速度快,整体性能更好。我们引入基于哈希的最大熵模型实现,该模型可以作为神经网络模型的一部分进行训练。这将显著降低计算复杂性。在英语广播新闻语音识别任务中,相对于在400M标记上训练的大型4克模型,我们实现了大约10%的单词错误率相对降低。
{"title":"Strategies for training large scale neural network language models","authors":"Tomas Mikolov, Anoop Deoras, Daniel Povey, L. Burget, J. Černocký","doi":"10.1109/ASRU.2011.6163930","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163930","url":null,"abstract":"We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122277158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 528
Derivative kernels for noise robust ASR 噪声鲁棒ASR的导数核
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163916
A. Ragni, M. Gales
Recently there has been interest in combining generative and discriminative classifiers. In these classifiers features for the discriminative models are derived from the generative kernels. One advantage of using generative kernels is that systematic approaches exist to introduce complex dependencies into the feature-space. Furthermore, as the features are based on generative models standard model-based compensation and adaptation techniques can be applied to make discriminative models robust to noise and speaker conditions. This paper extends previous work in this framework in several directions. First, it introduces derivative kernels based on context-dependent generative models. Second, it describes how derivative kernels can be incorporated in structured discriminative models. Third, it addresses the issues associated with large number of classes and parameters when context-dependent models and high-dimensional feature-spaces of derivative kernels are used. The approach is evaluated on two noise-corrupted tasks: small vocabulary AURORA 2 and medium-to-large vocabulary AURORA 4 task.
最近有兴趣结合生成和判别分类器。在这些分类器中,判别模型的特征来源于生成核。使用生成核的一个优点是存在系统化的方法来将复杂的依赖关系引入特征空间。此外,由于特征是基于生成模型的,标准的基于模型的补偿和自适应技术可以使判别模型对噪声和说话人条件具有鲁棒性。本文从几个方面扩展了以前在这个框架下的工作。首先,它引入了基于上下文相关生成模型的衍生核。其次,它描述了如何将衍生核纳入结构化判别模型。第三,它解决了当使用上下文相关模型和衍生核的高维特征空间时与大量类和参数相关的问题。在小词汇量AURORA 2和中大型词汇量AURORA 4两个噪声干扰任务上对该方法进行了评价。
{"title":"Derivative kernels for noise robust ASR","authors":"A. Ragni, M. Gales","doi":"10.1109/ASRU.2011.6163916","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163916","url":null,"abstract":"Recently there has been interest in combining generative and discriminative classifiers. In these classifiers features for the discriminative models are derived from the generative kernels. One advantage of using generative kernels is that systematic approaches exist to introduce complex dependencies into the feature-space. Furthermore, as the features are based on generative models standard model-based compensation and adaptation techniques can be applied to make discriminative models robust to noise and speaker conditions. This paper extends previous work in this framework in several directions. First, it introduces derivative kernels based on context-dependent generative models. Second, it describes how derivative kernels can be incorporated in structured discriminative models. Third, it addresses the issues associated with large number of classes and parameters when context-dependent models and high-dimensional feature-spaces of derivative kernels are used. The approach is evaluated on two noise-corrupted tasks: small vocabulary AURORA 2 and medium-to-large vocabulary AURORA 4 task.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121553997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Sparse Maximum A Posteriori adaptation 稀疏最大值后验自适应
Pub Date : 2011-10-28 DOI: 10.1109/ASRU.2011.6163905
P. Olsen, Jing Huang, V. Goel, Steven J. Rennie
Maximum A Posteriori (MAP) adaptation is a powerful tool for building speaker specific acoustic models. Modern speech applications utilize acoustic models with millions of parameters, and serve millions of users. Storing an acoustic model for each user in such settings is costly. However, speaker specific acoustic models are generally similar to the acoustic model being adapted. By imposing sparseness constraints, we can save significantly on storage, and even improve the quality of the resulting speaker-dependent model. In this paper we utilize the ℓ1 or ℓ0 norm as a regularizer to induce sparsity. We show that we can obtain up to 95% sparsity with negligible loss in recognition accuracy, with both penalties. By removing small differences, which constitute “adaptation noise”, sparse MAP is actually able to improve upon MAP adaptation. Sparse MAP reduces the MAP word error rate by 2% relative at 89% sparsity.
最大后验A (MAP)自适应是建立扬声器特定声学模型的有力工具。现代语音应用使用具有数百万参数的声学模型,为数百万用户提供服务。在这种情况下,为每个用户存储声学模型的成本很高。然而,扬声器特定的声学模型通常与被改编的声学模型相似。通过施加稀疏性约束,我们可以显著节省存储空间,甚至可以提高生成的依赖于说话人的模型的质量。在本文中,我们利用1或0范数作为正则化器来诱导稀疏性。我们表明,我们可以获得高达95%的稀疏性,而识别精度的损失可以忽略不计。通过去除构成“自适应噪声”的微小差异,稀疏MAP实际上是对MAP自适应的改进。在稀疏度为89%的情况下,Sparse MAP将MAP单词错误率降低了2%。
{"title":"Sparse Maximum A Posteriori adaptation","authors":"P. Olsen, Jing Huang, V. Goel, Steven J. Rennie","doi":"10.1109/ASRU.2011.6163905","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163905","url":null,"abstract":"Maximum A Posteriori (MAP) adaptation is a powerful tool for building speaker specific acoustic models. Modern speech applications utilize acoustic models with millions of parameters, and serve millions of users. Storing an acoustic model for each user in such settings is costly. However, speaker specific acoustic models are generally similar to the acoustic model being adapted. By imposing sparseness constraints, we can save significantly on storage, and even improve the quality of the resulting speaker-dependent model. In this paper we utilize the ℓ1 or ℓ0 norm as a regularizer to induce sparsity. We show that we can obtain up to 95% sparsity with negligible loss in recognition accuracy, with both penalties. By removing small differences, which constitute “adaptation noise”, sparse MAP is actually able to improve upon MAP adaptation. Sparse MAP reduces the MAP word error rate by 2% relative at 89% sparsity.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114593672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An hierarchical exemplar-based sparse model of speech, with an application to ASR 基于分层样例的语音稀疏模型,并在ASR中的应用
Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2011.6163913
J. Gemmeke, H. V. hamme
We propose a hierarchical exemplar-based model of speech, as well as a new algorithm, to efficiently find sparse linear combinations of exemplars in dictionaries containing hundreds of thousands exemplars. We use a variant of hierarchical agglomerative clustering to find a hierarchy connecting all exemplars, so that each exemplar is a parent to two child nodes. We use a modified version of a multiplicative-updates based algorithm to find sparse representations starting from a small active set of exemplars from the dictionary. Namely, on each iteration we replace exemplars that have an increasing weight by their child-nodes. We illustrate the properties of the proposed method by investigating computational effort, accuracy of the eventual sparse representation and speech recognition accuracy on a digit recognition task.
我们提出了一种基于分层样例的语音模型,以及一种新的算法,以有效地在包含数十万样例的字典中找到稀疏线性组合的样例。我们使用层次聚合聚类的一种变体来找到连接所有示例的层次结构,以便每个示例是两个子节点的父节点。我们使用基于乘法更新算法的改进版本,从字典中的小活动样本集开始查找稀疏表示。也就是说,在每次迭代中,我们用子节点替换权重增加的示例。我们通过研究数字识别任务上的计算量、最终稀疏表示的准确性和语音识别精度来说明所提出方法的特性。
{"title":"An hierarchical exemplar-based sparse model of speech, with an application to ASR","authors":"J. Gemmeke, H. V. hamme","doi":"10.1109/ASRU.2011.6163913","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163913","url":null,"abstract":"We propose a hierarchical exemplar-based model of speech, as well as a new algorithm, to efficiently find sparse linear combinations of exemplars in dictionaries containing hundreds of thousands exemplars. We use a variant of hierarchical agglomerative clustering to find a hierarchy connecting all exemplars, so that each exemplar is a parent to two child nodes. We use a modified version of a multiplicative-updates based algorithm to find sparse representations starting from a small active set of exemplars from the dictionary. Namely, on each iteration we replace exemplars that have an increasing weight by their child-nodes. We illustrate the properties of the proposed method by investigating computational effort, accuracy of the eventual sparse representation and speech recognition accuracy on a digit recognition task.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129430424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2011 IEEE Workshop on Automatic Speech Recognition & Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1