首页 > 最新文献

2011 IEEE Workshop on Automatic Speech Recognition & Understanding最新文献

英文 中文
Employing web search query click logs for multi-domain spoken language understanding 采用网页搜索查询点击日志进行多域口语理解
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163968
Dilek Z. Hakkani-Tür, Gökhan Tür, Larry Heck, Asli Celikyilmaz, Ashley Fidler, D. Hillard, R. Iyer, S. Parthasarathy
Logs of user queries from a search engine (such as Bing or Google) together with the links clicked provide valuable implicit feedback to improve statistical spoken language understanding (SLU) models. In this work, we propose to enrich the existing classification feature set for domain detection with features computed using the click distribution over a set of clicked URLs from search query click logs (QCLs) of user utterances. Since the form of natural language utterances differs stylistically from that of keyword search queries, to be able to match natural language utterances with related search queries, we perform a syntax-based transformation of the original utterances, after filtering out domain-independent salient phrases. This approach results in significant improvements for domain detection, especially when detecting the domains of web-related user utterances.
来自搜索引擎(如Bing或Google)的用户查询日志以及点击的链接提供了有价值的隐式反馈,以改进统计口语理解(SLU)模型。在这项工作中,我们提出利用用户话语的搜索查询点击日志(QCLs)中一组被点击的url上的点击分布计算的特征来丰富现有的分类特征集,用于域检测。由于自然语言话语的形式与关键词搜索查询的形式在风格上有所不同,为了能够将自然语言话语与相关搜索查询匹配,我们在过滤掉与领域无关的显著短语后,对原始话语进行基于语法的转换。这种方法显著改善了领域检测,特别是在检测与网络相关的用户话语的领域时。
{"title":"Employing web search query click logs for multi-domain spoken language understanding","authors":"Dilek Z. Hakkani-Tür, Gökhan Tür, Larry Heck, Asli Celikyilmaz, Ashley Fidler, D. Hillard, R. Iyer, S. Parthasarathy","doi":"10.1109/ASRU.2011.6163968","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163968","url":null,"abstract":"Logs of user queries from a search engine (such as Bing or Google) together with the links clicked provide valuable implicit feedback to improve statistical spoken language understanding (SLU) models. In this work, we propose to enrich the existing classification feature set for domain detection with features computed using the click distribution over a set of clicked URLs from search query click logs (QCLs) of user utterances. Since the form of natural language utterances differs stylistically from that of keyword search queries, to be able to match natural language utterances with related search queries, we perform a syntax-based transformation of the original utterances, after filtering out domain-independent salient phrases. This approach results in significant improvements for domain detection, especially when detecting the domains of web-related user utterances.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133287442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Pruning exponential language models 修剪指数语言模型
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163937
Stanley F. Chen, A. Sethy, B. Ramabhadran
Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been little work on the pruning of these models. In this paper, we propose several pruning algorithms for general exponential language models. We show that our best algorithm applied to an exponential n-gram model outperforms existing n-gram model pruning algorithms by up to 0.4% absolute in speech recognition word-error rate on Wall Street Journal and Broadcast News data sets. In addition, we show that Model M, an exponential class-based language model, retains its performance improvement over conventional word n-gram models when pruned to equal size, with gains of up to 2.5% absolute in word-error rate.
语言模型修剪是在资源受限设备上运行的语音应用程序的一项重要技术,针对传统的词n图模型已经开发了许多修剪算法。然而,虽然指数语言模型可以提供更好的性能,但对这些模型进行修剪的工作很少。本文提出了几种适用于一般指数语言模型的剪枝算法。我们表明,应用于指数n-gram模型的最佳算法在华尔街日报和广播新闻数据集的语音识别单词错误率上优于现有n-gram模型修剪算法,绝对错误率高达0.4%。此外,我们表明,模型M,一个指数级的基于类的语言模型,在修剪到相同大小时,仍然比传统的单词n-gram模型保持性能改进,单词错误率的绝对增益高达2.5%。
{"title":"Pruning exponential language models","authors":"Stanley F. Chen, A. Sethy, B. Ramabhadran","doi":"10.1109/ASRU.2011.6163937","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163937","url":null,"abstract":"Language model pruning is an essential technology for speech applications running on resource-constrained devices, and many pruning algorithms have been developed for conventional word n-gram models. However, while exponential language models can give superior performance, there has been little work on the pruning of these models. In this paper, we propose several pruning algorithms for general exponential language models. We show that our best algorithm applied to an exponential n-gram model outperforms existing n-gram model pruning algorithms by up to 0.4% absolute in speech recognition word-error rate on Wall Street Journal and Broadcast News data sets. In addition, we show that Model M, an exponential class-based language model, retains its performance improvement over conventional word n-gram models when pruned to equal size, with gains of up to 2.5% absolute in word-error rate.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133299306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition 不要轻易相乘:语音识别中声学模型假设的量化问题
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163908
D. Gillick, L. Gillick, S. Wegmann
We describe a series of experiments simulating data from the standard Hidden Markov Model (HMM) framework used for speech recognition. Starting with a set of test transcriptions, we begin by simulating every step of the generative process. In each subsequent experiment, we substitute a real component for a simulated component (real state durations rather than simulating from the transition models, for example), and compare the word error rates of the resulting data, thus quantifying the relative costs of each modeling assumption. A novel sampling process allows us to test the independence assumptions of the HMM, which appear to present far more serious problems than the other data/model mismatches.
我们描述了一系列模拟用于语音识别的标准隐马尔可夫模型(HMM)框架数据的实验。从一组测试转录开始,我们开始模拟生成过程的每一步。在随后的每个实验中,我们用真实组件代替模拟组件(例如,真实状态持续时间,而不是从转换模型中模拟),并比较结果数据的单词错误率,从而量化每个建模假设的相对成本。一种新颖的采样过程允许我们测试HMM的独立性假设,这些假设似乎比其他数据/模型不匹配提出了更严重的问题。
{"title":"Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition","authors":"D. Gillick, L. Gillick, S. Wegmann","doi":"10.1109/ASRU.2011.6163908","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163908","url":null,"abstract":"We describe a series of experiments simulating data from the standard Hidden Markov Model (HMM) framework used for speech recognition. Starting with a set of test transcriptions, we begin by simulating every step of the generative process. In each subsequent experiment, we substitute a real component for a simulated component (real state durations rather than simulating from the transition models, for example), and compare the word error rates of the resulting data, thus quantifying the relative costs of each modeling assumption. A novel sampling process allows us to test the independence assumptions of the HMM, which appear to present far more serious problems than the other data/model mismatches.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134537896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Towards choosing better primes for spoken dialog systems 为口语对话系统选择更好的词
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163949
José Lopes, M. Eskénazi, I. Trancoso
When humans and computers use the same terms (primes, when they entrain to one another), spoken dialogs proceed more smoothly. The goal of this paper is to describe initial steps we have found that will enable us to eventually automatically choose better primes in spoken dialog system prompts. Two different sets of prompts were used to understand what makes one prime more suitable than another. The impact of the primes chosen in speech recognition was evaluated. In addition, results reveal that users did adopt the new vocabulary introduced in the new system prompts. As a result of this, performance of the system improved, providing clues for the trade off needed when choosing between adequate primes in prompts and speech recognition performance.
当人类和计算机使用相同的术语(质数,当它们彼此连用时),口语对话进行得更顺利。本文的目标是描述我们发现的初始步骤,这些步骤将使我们最终能够在口语对话系统提示中自动选择更好的启动词。我们使用了两组不同的提示来理解是什么让一个素数比另一个更合适。评价了所选词对语音识别的影响。此外,结果显示,用户确实采用了新系统提示中引入的新词汇。因此,系统的性能得到了提高,为在提示和语音识别性能之间选择适当的素数提供了折衷的线索。
{"title":"Towards choosing better primes for spoken dialog systems","authors":"José Lopes, M. Eskénazi, I. Trancoso","doi":"10.1109/ASRU.2011.6163949","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163949","url":null,"abstract":"When humans and computers use the same terms (primes, when they entrain to one another), spoken dialogs proceed more smoothly. The goal of this paper is to describe initial steps we have found that will enable us to eventually automatically choose better primes in spoken dialog system prompts. Two different sets of prompts were used to understand what makes one prime more suitable than another. The impact of the primes chosen in speech recognition was evaluated. In addition, results reveal that users did adopt the new vocabulary introduced in the new system prompts. As a result of this, performance of the system improved, providing clues for the trade off needed when choosing between adequate primes in prompts and speech recognition performance.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114477784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Multi-taper MFCC features for speaker verification using I-vectors 多锥度MFCC功能扬声器验证使用i向量
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163886
Md. Jahangir Alam, T. Kinnunen, P. Kenny, P. Ouellet, D. O'Shaughnessy
This paper studies the low-variance multi-taper mel-frequency cepstral coefficient (MFCC) features in the state-of-the-art speaker verification. The MFCC features are usually computed using a Hamming-windowed DFT spectrum. Windowing reduces the bias of the spectrum but variance remains high. Recently, low-variance multi-taper MFCC features were studied in speaker verification with promising preliminary results on the NIST 2002 SRE data using a simple GMM-UBM recognizer. In this study our goal is to validate those findings using a up-to-date i-vector classifier on the latest NIST 2010 SRE data. Our experiment on the telephone (det5) and microphone speech (det1, det2, det3 and det4) indicate that the multi-taper approaches perform better than the conventional Hamming window technique.
本文研究了低方差多锥度mel-frequency倒频谱系数(MFCC)特征在最新扬声器验证中的应用。MFCC特征通常是使用汉明窗DFT谱来计算的。加窗减少了光谱的偏差,但方差仍然很高。近年来,利用简单的GMM-UBM识别器在NIST 2002 SRE数据上研究了低方差多锥度MFCC特征在说话人验证中的应用,并取得了良好的初步结果。在这项研究中,我们的目标是在最新的NIST 2010 SRE数据上使用最新的i向量分类器来验证这些发现。我们在电话(det5)和麦克风语音(det1, det2, det3和det4)上的实验表明,多锥度方法比传统的汉明窗技术表现得更好。
{"title":"Multi-taper MFCC features for speaker verification using I-vectors","authors":"Md. Jahangir Alam, T. Kinnunen, P. Kenny, P. Ouellet, D. O'Shaughnessy","doi":"10.1109/ASRU.2011.6163886","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163886","url":null,"abstract":"This paper studies the low-variance multi-taper mel-frequency cepstral coefficient (MFCC) features in the state-of-the-art speaker verification. The MFCC features are usually computed using a Hamming-windowed DFT spectrum. Windowing reduces the bias of the spectrum but variance remains high. Recently, low-variance multi-taper MFCC features were studied in speaker verification with promising preliminary results on the NIST 2002 SRE data using a simple GMM-UBM recognizer. In this study our goal is to validate those findings using a up-to-date i-vector classifier on the latest NIST 2010 SRE data. Our experiment on the telephone (det5) and microphone speech (det1, det2, det3 and det4) indicate that the multi-taper approaches perform better than the conventional Hamming window technique.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117075641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Adapting n-gram maximum entropy language models with conditional entropy regularization 采用条件熵正则化的n元最大熵语言模型
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163934
A. Rastrow, Mark Dredze, S. Khudanpur
Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.
语言模型参数的准确估计对于构建高质量的文本生成系统(如自动语音识别)至关重要。然而,感兴趣的领域的文本训练数据通常是不可用的。相反,我们使用半监督模型自适应;使用未标记的域内数据(原始语音音频)和标记的域外数据(文本)来估计参数。在这项工作中,我们提出了一种新的半监督语言模型自适应过程,用于具有n-gram特征的最大熵模型。我们在域外文本数据的基础上增加了一个附加项,以最小化域内音频的条件熵。此外,我们还演示了如何使用一阶和二阶期望半环在语音格上有效地计算条件熵。当将最大熵语言模型从广播新闻改编到MIT讲座时,我们展示了在单词错误率方面比其他自适应技术的改进。
{"title":"Adapting n-gram maximum entropy language models with conditional entropy regularization","authors":"A. Rastrow, Mark Dredze, S. Khudanpur","doi":"10.1109/ASRU.2011.6163934","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163934","url":null,"abstract":"Accurate estimates of language model parameters are critical for building quality text generation systems, such as automatic speech recognition. However, text training data for a domain of interest is often unavailable. Instead, we use semi-supervised model adaptation; parameters are estimated using both unlabeled in-domain data (raw speech audio) and labeled out of domain data (text.) In this work, we present a new semi-supervised language model adaptation procedure for Maximum Entropy models with n-gram features. We augment the conventional maximum likelihood training criterion on out-of-domain text data with an additional term to minimize conditional entropy on in-domain audio. Additionally, we demonstrate how to compute conditional entropy efficiently on speech lattices using first- and second-order expectation semirings. We demonstrate improvements in terms of word error rate over other adaptation techniques when adapting a maximum entropy language model from broadcast news to MIT lectures.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115247229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features 具有形态学和n -最优列表特征的ASR假设的判别重排序
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163931
H. Sak, M. Saraçlar, Tunga Güngör
This paper explores rich morphological and novel n-best-list features for reranking automatic speech recognition hypotheses. The morpholexical features are defined over the morphological features obtained by using an n-gram language model over lexical and grammatical morphemes in the first-pass. The n-best-list features for each hypothesis are defined using that hypothesis and other alternate hypotheses in an n-best list. Our methodology is to align each hypothesis with other hypotheses one by one using minimum edit distance alignment. This gives us a set of edit operations - substitution, addition and deletion as seen in these alignments. These edit operations constitute our n-best-list features as indicator features. The reranking model is trained using a word error rate sensitive averaged perceptron algorithm introduced in this paper. The proposed methods are evaluated on a Turkish broadcast news transcription task. The baseline systems are word and statistical sub-word systems which also employ morphological features for reranking. We show that morpholexical and n-best-list features are effective in improving the accuracy of the system (0.8%).
本文探索了丰富的形态学和新颖的n-best-list特征,用于自动语音识别假设的重新排序。词素特征是通过使用n-gram语言模型对首遍的词素和语法语素所获得的词素特征进行定义的。每个假设的n个最佳列表特征使用该假设和n个最佳列表中的其他备选假设来定义。我们的方法是使用最小编辑距离对齐将每个假设与其他假设一个接一个地对齐。这为我们提供了一组编辑操作-替换,添加和删除,如这些对齐所示。这些编辑操作构成了我们的n个最佳列表功能,作为指示功能。重新排序模型的训练采用了错误率敏感平均感知器算法。在土耳其广播新闻转录任务中对所提出的方法进行了评估。基线系统是词和统计子词系统,它们也使用形态特征进行重新排序。我们发现形态学特征和n-best-list特征在提高系统准确率方面是有效的(0.8%)。
{"title":"Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features","authors":"H. Sak, M. Saraçlar, Tunga Güngör","doi":"10.1109/ASRU.2011.6163931","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163931","url":null,"abstract":"This paper explores rich morphological and novel n-best-list features for reranking automatic speech recognition hypotheses. The morpholexical features are defined over the morphological features obtained by using an n-gram language model over lexical and grammatical morphemes in the first-pass. The n-best-list features for each hypothesis are defined using that hypothesis and other alternate hypotheses in an n-best list. Our methodology is to align each hypothesis with other hypotheses one by one using minimum edit distance alignment. This gives us a set of edit operations - substitution, addition and deletion as seen in these alignments. These edit operations constitute our n-best-list features as indicator features. The reranking model is trained using a word error rate sensitive averaged perceptron algorithm introduced in this paper. The proposed methods are evaluated on a Turkish broadcast news transcription task. The baseline systems are word and statistical sub-word systems which also employ morphological features for reranking. We show that morpholexical and n-best-list features are effective in improving the accuracy of the system (0.8%).","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125028263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Efficient representation and fast look-up of Maximum Entropy language models 最大熵语言模型的高效表示和快速查找
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163936
Jia Cui, Stanley F. Chen, Bowen Zhou
Word class information has long been proven useful in language modeling (LM). However, the improved performance of class-based LMs over word n-gram models generally comes at the cost of increased decoding complexity and model size. In this paper, we propose a modified version of the Maximum Entropy token-based language model of [1] that matches the performance of the best existing class-based models, but which is as fast for decoding as a word n-gram model. In addition, while it is easy to statically combine word n-gram models built on different corpora into a single word n-gram model for fast decoding, it is unknown how to statically combine class-based LMs effectively. Another contribution of this paper is to propose a novel combination method that retains the gain of class-based LMs over word n-gram models. Experimental results on several spoken language translation tasks show that our model performs significantly better than word n-gram models with comparable decoding speed and only a modest increase in model size.
长期以来,词类信息在语言建模(LM)中被证明是有用的。然而,基于类的lm优于词n-gram模型的性能通常是以增加解码复杂性和模型大小为代价的。在本文中,我们提出了[1]的基于最大熵标记的语言模型的修改版本,该模型的性能与现有的最佳基于类的模型相匹配,但其解码速度与单词n-gram模型一样快。此外,虽然在不同语料库上构建的词n-gram模型可以很容易地静态组合成单个词n-gram模型以实现快速解码,但如何有效地静态组合基于类的lm是未知的。本文的另一个贡献是提出了一种新的组合方法,该方法保留了基于类的LMs相对于词n-gram模型的增益。几个口语翻译任务的实验结果表明,我们的模型在解码速度相当的情况下表现明显优于单词n-gram模型,并且模型大小只有适度的增加。
{"title":"Efficient representation and fast look-up of Maximum Entropy language models","authors":"Jia Cui, Stanley F. Chen, Bowen Zhou","doi":"10.1109/ASRU.2011.6163936","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163936","url":null,"abstract":"Word class information has long been proven useful in language modeling (LM). However, the improved performance of class-based LMs over word n-gram models generally comes at the cost of increased decoding complexity and model size. In this paper, we propose a modified version of the Maximum Entropy token-based language model of [1] that matches the performance of the best existing class-based models, but which is as fast for decoding as a word n-gram model. In addition, while it is easy to statically combine word n-gram models built on different corpora into a single word n-gram model for fast decoding, it is unknown how to statically combine class-based LMs effectively. Another contribution of this paper is to propose a novel combination method that retains the gain of class-based LMs over word n-gram models. Experimental results on several spoken language translation tasks show that our model performs significantly better than word n-gram models with comparable decoding speed and only a modest increase in model size.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121729343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speaker adaptation based on speaker-dependent eigenphone estimation 基于说话人相关特征电话估计的说话人自适应
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163904
Wenlin Zhang, Weiqiang Zhang, Bi-cheng Li
Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.
本文提出了一种基于说话人相关特征电话估计的说话人自适应技术。与传统的说话人自适应方法不同,该方法通过在电话空间中的子空间建模,明确地对每个说话人的电话变化进行建模。所有说话者共享的电话坐标包含了不同电话之间的相关信息。在说话人自适应过程中,分别在最大似然(ML)准则和最大后验(MAP)准则下推导了两种估计新说话人特定电话变异基(即特征电话)的方案。在汉语普通话连续语音识别任务上进行的有监督说话人自适应实验表明,当有足够的自适应数据时,新方法优于特征语音和最大似然线性回归方法。
{"title":"Speaker adaptation based on speaker-dependent eigenphone estimation","authors":"Wenlin Zhang, Weiqiang Zhang, Bi-cheng Li","doi":"10.1109/ASRU.2011.6163904","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163904","url":null,"abstract":"Based on speaker dependent eigenphone estimation, a novel speaker adaptation technique is proposed in this paper. Different from conventional speaker adaptation approaches, the proposed method explicitly models the phone variations for each speaker through subspace modeling in the phone space. The phone coordinate, which is shared by all speakers, contains correlation information between different phones. During speaker adaptation, two schemes for estimation of the new speaker specific phone variation bases (namely eigenphones) are derived under maximum likelihood (ML) criterion and maximum a posteriori (MAP) criterion respectively. Supervised speaker adaptation experiments on a Mandarin Chinese continuous speech recognition task show that the new method outperforms both eigenvoice and maximum likelihood linear regression (MLLR) methods when sufficient adaptation data is available.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124227970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bag of n-gram driven decoding for LVCSR system harnessing 用于LVCSR系统控制的n-gram驱动解码包
Pub Date : 2011-12-01 DOI: 10.1109/ASRU.2011.6163944
Fethi Bougares, Y. Estève, P. Deléglise, G. Linarès
This paper focuses on automatic speech recognition systems combination based on driven decoding paradigms. The driven decoding algorithm (DDA) involves the use of a 1-best hypothesis provided by an auxiliary system as another knowledge source in the search algorithm of a primary system. In previous studies, it was shown that DDA outperforms ROVER when the primary system is guided by a more accurate system. In this paper we propose a new method to manage auxiliary transcriptions which are presented as a bag-of-n-grams (BONG) without temporal matching. These modifications allow to make easier the combination of several hypotheses given by different auxiliary systems. Using BONG combination with hypotheses provided by two auxiliary systems, each of which obtained more than 23% of WER on the same data, our experiments show that a CMU Sphinx based ASR system can reduce its WER from 19.85% to 18.66% which is better than the results reached with DDA or classical ROVER combination.
本文主要研究基于驱动解码范式的语音自动识别系统组合。驱动解码算法(DDA)是将辅助系统提供的1-best假设作为主系统搜索算法的另一个知识来源。先前的研究表明,当主系统由更精确的系统引导时,DDA的性能优于ROVER。在本文中,我们提出了一种新的方法来管理辅助转录,这些辅助转录以n-grams袋(BONG)的形式呈现,没有时间匹配。这些修改可以使不同辅助系统给出的几个假设的组合更容易。我们的实验表明,基于CMU Sphinx的ASR系统可以将其WER从19.85%降低到18.66%,优于DDA或经典ROVER组合所达到的结果。
{"title":"Bag of n-gram driven decoding for LVCSR system harnessing","authors":"Fethi Bougares, Y. Estève, P. Deléglise, G. Linarès","doi":"10.1109/ASRU.2011.6163944","DOIUrl":"https://doi.org/10.1109/ASRU.2011.6163944","url":null,"abstract":"This paper focuses on automatic speech recognition systems combination based on driven decoding paradigms. The driven decoding algorithm (DDA) involves the use of a 1-best hypothesis provided by an auxiliary system as another knowledge source in the search algorithm of a primary system. In previous studies, it was shown that DDA outperforms ROVER when the primary system is guided by a more accurate system. In this paper we propose a new method to manage auxiliary transcriptions which are presented as a bag-of-n-grams (BONG) without temporal matching. These modifications allow to make easier the combination of several hypotheses given by different auxiliary systems. Using BONG combination with hypotheses provided by two auxiliary systems, each of which obtained more than 23% of WER on the same data, our experiments show that a CMU Sphinx based ASR system can reduce its WER from 19.85% to 18.66% which is better than the results reached with DDA or classical ROVER combination.","PeriodicalId":338241,"journal":{"name":"2011 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130279166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2011 IEEE Workshop on Automatic Speech Recognition & Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1