通过语言模型的改进来改进语音自动识别

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.21437/IBERSPEECH.2018-8

A. Martín, C. García-Mateo, Laura Docío Fernández

{"title":"通过语言模型的改进来改进语音自动识别","authors":"A. Martín, C. García-Mateo, Laura Docío Fernández","doi":"10.21437/IBERSPEECH.2018-8","DOIUrl":null,"url":null,"abstract":"Language models are one of the pillars on which the performance of automatic speech recognition systems are based. Statistical language models that use word sequence probabilities (n-grams) are the most common, although deep neural networks are also now beginning to be applied here. This is possible due to the increases in computation power and improvements in algorithms. In this paper, the impact that language models have on the results of recognition is addressed in the following situations: 1) when they are adjusted to the work environment of the ﬁnal application, and 2) when their complexity grows due to increases in the order of the n-gram models or by the applica-tion of deep neural networks. Speciﬁcally, an automatic speech recognition system with different language models is applied to audio recordings, these corresponding to three experimental frameworks: formal orality, talk on newscasts, and TED talks in Galician. Experimental results showed that improving the quality of language models yields improvements in recognition performance.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving the Automatic Speech Recognition through the improvement of Laguage Models\",\"authors\":\"A. Martín, C. García-Mateo, Laura Docío Fernández\",\"doi\":\"10.21437/IBERSPEECH.2018-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Language models are one of the pillars on which the performance of automatic speech recognition systems are based. Statistical language models that use word sequence probabilities (n-grams) are the most common, although deep neural networks are also now beginning to be applied here. This is possible due to the increases in computation power and improvements in algorithms. In this paper, the impact that language models have on the results of recognition is addressed in the following situations: 1) when they are adjusted to the work environment of the ﬁnal application, and 2) when their complexity grows due to increases in the order of the n-gram models or by the applica-tion of deep neural networks. Speciﬁcally, an automatic speech recognition system with different language models is applied to audio recordings, these corresponding to three experimental frameworks: formal orality, talk on newscasts, and TED talks in Galician. Experimental results showed that improving the quality of language models yields improvements in recognition performance.\",\"PeriodicalId\":115963,\"journal\":{\"name\":\"IberSPEECH Conference\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IberSPEECH Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/IBERSPEECH.2018-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语言模型是自动语音识别系统性能的基础之一。使用单词序列概率(n-gram)的统计语言模型是最常见的，尽管深度神经网络现在也开始在这里应用。由于计算能力的提高和算法的改进，这是可能的。本文讨论了语言模型在以下情况下对识别结果的影响:1)当它们适应最终应用的工作环境时，以及2)由于n-gram模型的阶数增加或深度神经网络的应用而导致其复杂性增加时。具体来说，我们将一个具有不同语言模型的自动语音识别系统应用于录音，这些模型对应于三个实验框架:正式口语、新闻广播演讲和加利西亚TED演讲。实验结果表明，提高语言模型的质量可以提高识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving the Automatic Speech Recognition through the improvement of Laguage Models

Language models are one of the pillars on which the performance of automatic speech recognition systems are based. Statistical language models that use word sequence probabilities (n-grams) are the most common, although deep neural networks are also now beginning to be applied here. This is possible due to the increases in computation power and improvements in algorithms. In this paper, the impact that language models have on the results of recognition is addressed in the following situations: 1) when they are adjusted to the work environment of the ﬁnal application, and 2) when their complexity grows due to increases in the order of the n-gram models or by the applica-tion of deep neural networks. Speciﬁcally, an automatic speech recognition system with different language models is applied to audio recordings, these corresponding to three experimental frameworks: formal orality, talk on newscasts, and TED talks in Galician. Experimental results showed that improving the quality of language models yields improvements in recognition performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量