首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Speech data retrieval system constructed on a universal phonetic code domain 基于通用语音码域的语音数据检索系统
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034652
K. Tanaka, Y. Itoh, H. Kojima, Nahoko Fujimura
We propose a novel speech processing framework, where all of the speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce a sub-phonetic segment (SPS) set, based on IPA (international phonetic alphabet), to deal with multilingual speech and develop a procedure to estimate acoustic models of the SPS from IPA-like phone models. The key point of the framework is to employ environment adaptation into the SPS encoding stage. This makes it possible to normalize acoustic variations and extract the language factor contained in speech signals as encoded SPS sequences. We confirm these characteristics by constructing a speech retrieval system on the SPS domain. The system can retrieve key phrases, given by speech, from different environment speech data in a vocabulary-free condition. We show several preliminary experimental results on this system, using Japanese and English sentence speech sets.
我们提出了一种新的语音处理框架,将所有语音数据编码为通用语音码(UPC)序列,并在此UPC域上构建语音识别、检索、消化等语音处理系统。首先,我们引入了一种基于国际音标(IPA)的亚语音段(SPS)集来处理多语言语音,并开发了一种从类IPA电话模型中估计SPS声学模型的方法。该框架的关键在于将环境自适应引入到SPS编码阶段。这使得将声音变化归一化和提取语音信号中包含的语言因子作为编码的SPS序列成为可能。我们通过构建一个基于SPS域的语音检索系统来验证这些特征。该系统可以在无词汇条件下从不同的环境语音数据中检索语音给出的关键短语。我们用日语和英语句子语音集展示了该系统的几个初步实验结果。
{"title":"Speech data retrieval system constructed on a universal phonetic code domain","authors":"K. Tanaka, Y. Itoh, H. Kojima, Nahoko Fujimura","doi":"10.1109/ASRU.2001.1034652","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034652","url":null,"abstract":"We propose a novel speech processing framework, where all of the speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce a sub-phonetic segment (SPS) set, based on IPA (international phonetic alphabet), to deal with multilingual speech and develop a procedure to estimate acoustic models of the SPS from IPA-like phone models. The key point of the framework is to employ environment adaptation into the SPS encoding stage. This makes it possible to normalize acoustic variations and extract the language factor contained in speech signals as encoded SPS sequences. We confirm these characteristics by constructing a speech retrieval system on the SPS domain. The system can retrieve key phrases, given by speech, from different environment speech data in a vocabulary-free condition. We show several preliminary experimental results on this system, using Japanese and English sentence speech sets.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129965062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Ubiquitous speech communication interface 无处不在的语音通信接口
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034595
B. Juang
The Holy Grail of telecommunication is to bring people thousands miles apart, anytime, anywhere, together to communicate as if they were having a face-to-face conversation in a ubiquitous telepresence scenario. One key component necessary to reach this Holy Grail is the technology that supports hands-free speech communication. Hands-free telecommunication (both telephony and teleconferencing) refers to a communication mode in which the participants interact with each other over a communication network, without having to wear or hold any special device. For speech communications, we normally need a loudspeaker, a microphone or a headset. The goal of hands-free speech communication is thus to provide the users with an intelligent voice interface, which provides high quality communication and is safe, convenient, and natural to use. This goal stipulates many challenging technical issues, such as multiple sound sources, echo and reverberation in the room, and natural human-machine interaction, the resolution of which needs to be integrated into a working system before the benefit of hands-free telecommunication can be realized. We analyze these issues and review progress made in the last two decades, particularly from the viewpoint of signal acquisition, restoration and enhancement. We lay out new technical dimensions that may lead to further advances towards realization of a truly ubiquitous speech communication interface to an intelligent information source, be it a human or a machine.
电信的终极目标是将相隔千里的人们,随时随地,聚集在一起进行交流,就像他们在无处不在的远程呈现场景中进行面对面的交谈一样。实现这一目标所必需的一个关键组件是支持免提语音通信的技术。免提通信(包括电话和电话会议)是指参与者无需佩戴或持有任何特殊设备,即可通过通信网络相互交互的一种通信模式。对于语音交流,我们通常需要扬声器、麦克风或耳机。免提语音通信的目标是为用户提供一个智能语音接口,提供高质量的通信,使用安全、方便、自然。这一目标规定了许多具有挑战性的技术问题,例如室内的多声源、回声和混响以及自然的人机交互,这些问题的解决需要集成到一个工作系统中,然后才能实现免提通信的好处。我们分析了这些问题,并回顾了近二十年来在信号采集、恢复和增强方面取得的进展。我们提出了新的技术维度,可能会进一步推动实现一个真正无处不在的语音通信接口到智能信息源,无论是人还是机器。
{"title":"Ubiquitous speech communication interface","authors":"B. Juang","doi":"10.1109/ASRU.2001.1034595","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034595","url":null,"abstract":"The Holy Grail of telecommunication is to bring people thousands miles apart, anytime, anywhere, together to communicate as if they were having a face-to-face conversation in a ubiquitous telepresence scenario. One key component necessary to reach this Holy Grail is the technology that supports hands-free speech communication. Hands-free telecommunication (both telephony and teleconferencing) refers to a communication mode in which the participants interact with each other over a communication network, without having to wear or hold any special device. For speech communications, we normally need a loudspeaker, a microphone or a headset. The goal of hands-free speech communication is thus to provide the users with an intelligent voice interface, which provides high quality communication and is safe, convenient, and natural to use. This goal stipulates many challenging technical issues, such as multiple sound sources, echo and reverberation in the room, and natural human-machine interaction, the resolution of which needs to be integrated into a working system before the benefit of hands-free telecommunication can be realized. We analyze these issues and review progress made in the last two decades, particularly from the viewpoint of signal acquisition, restoration and enhancement. We lay out new technical dimensions that may lead to further advances towards realization of a truly ubiquitous speech communication interface to an intelligent information source, be it a human or a machine.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130249812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic sharings of Gaussian densities using phonetic features 使用语音特征的高斯密度动态共享
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034675
Kyung-Tak Lee, C. Wellekens
This paper describes a way to adapt the recognizer to pronunciation variability by dynamically sharing Gaussian densities across phonetic models. The method is divided in three steps. First, given an input utterance, an HMM recognizer outputs a lattice of the most likely word hypotheses. Then, the canonical pronunciation of each hypothesis is checked by comparing its theoretical phonetic features to those automatically extracted from speech. If the comparisons show that a phoneme of an hypothesis has likely been pronounced differently, its model is transformed by sharing its Gaussian densities with the ones of its possible alternate phone realization(s). Finally, the transformed models are used in a second-pass recognition. Sharings are dynamic because they are automatically adapted to each input speech. Experiments showed a 5.4% relative reduction in word error rate compared to the baseline and a 2.7% compared to a static method.
本文描述了一种通过在语音模型之间动态共享高斯密度来使识别器适应语音变化的方法。该方法分为三个步骤。首先,给定一个输入话语,HMM识别器输出一个最有可能的单词假设的格。然后,通过将理论语音特征与语音自动提取的语音特征进行比较,对每个假设的标准语音进行检验。如果比较表明假设的音素可能被不同地发音,则通过与其可能的替代电话实现共享其高斯密度来转换其模型。最后,将变换后的模型用于二次识别。分享是动态的,因为它们会自动适应每个输入的语音。实验表明,与基线相比,单词错误率相对降低了5.4%,与静态方法相比,错误率相对降低了2.7%。
{"title":"Dynamic sharings of Gaussian densities using phonetic features","authors":"Kyung-Tak Lee, C. Wellekens","doi":"10.1109/ASRU.2001.1034675","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034675","url":null,"abstract":"This paper describes a way to adapt the recognizer to pronunciation variability by dynamically sharing Gaussian densities across phonetic models. The method is divided in three steps. First, given an input utterance, an HMM recognizer outputs a lattice of the most likely word hypotheses. Then, the canonical pronunciation of each hypothesis is checked by comparing its theoretical phonetic features to those automatically extracted from speech. If the comparisons show that a phoneme of an hypothesis has likely been pronounced differently, its model is transformed by sharing its Gaussian densities with the ones of its possible alternate phone realization(s). Finally, the transformed models are used in a second-pass recognition. Sharings are dynamic because they are automatically adapted to each input speech. Experiments showed a 5.4% relative reduction in word error rate compared to the baseline and a 2.7% compared to a static method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132905929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Continuous multi-band speech recognition using Bayesian networks 基于贝叶斯网络的连续多波段语音识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034584
K. Daoudi, D. Fohr, Christophe Antoine
Using the Bayesian networks framework, we present a new multi-band approach for continuous speech recognition. This new approach has the advantage of overcoming all the limitations of the standard multi-band techniques. Moreover, it leads to a higher fidelity speech modeling than HMMs. We provide a preliminary evaluation of the performance of our new approach on a connected digits recognition task.
利用贝叶斯网络框架,提出了一种新的多频带连续语音识别方法。这种新方法的优点是克服了标准多波段技术的局限性。与hmm相比,该模型的语音建模保真度更高。我们对我们的新方法在连接数字识别任务上的性能进行了初步评估。
{"title":"Continuous multi-band speech recognition using Bayesian networks","authors":"K. Daoudi, D. Fohr, Christophe Antoine","doi":"10.1109/ASRU.2001.1034584","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034584","url":null,"abstract":"Using the Bayesian networks framework, we present a new multi-band approach for continuous speech recognition. This new approach has the advantage of overcoming all the limitations of the standard multi-band techniques. Moreover, it leads to a higher fidelity speech modeling than HMMs. We provide a preliminary evaluation of the performance of our new approach on a connected digits recognition task.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114461172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Bridging the gap between mixed-initiative dialogs and reusable sub-dialogs 弥合混合主动性对话框和可重用子对话框之间的差距
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034641
S. Kronenberg, P. Regel-Brietzman
For easing the development process for dialog systems it is desired that reusable dialog components provide pre-packaged functionality 'out-of-the-box' that enables developers to quickly build applications by providing standard default settings and behavior. Additionally, human-computer interaction should become more human-like in that mixed-initiative dialogs are supported. Mixed-initiative interaction requires the system to react to user initiated application specific commands whereby reusable dialog components have to be application independent to be used in different settings. This article presents a dialog mechanism, so called meta-dialog, which is responsible for the control flow between reusable sub-dialogs and mixed-initiative dialogs.
为了简化对话系统的开发过程,需要可重用的对话组件提供“开箱即用”的预打包功能,使开发人员能够通过提供标准的默认设置和行为来快速构建应用程序。此外,在支持混合主动对话框的情况下,人机交互应该变得更像人。混合主动交互要求系统对用户发起的特定于应用程序的命令作出反应,因此可重用的对话框组件必须独立于应用程序,以便在不同的设置中使用。本文介绍了一种对话机制,即所谓的元对话,它负责可重用子对话和混合活动对话之间的控制流。
{"title":"Bridging the gap between mixed-initiative dialogs and reusable sub-dialogs","authors":"S. Kronenberg, P. Regel-Brietzman","doi":"10.1109/ASRU.2001.1034641","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034641","url":null,"abstract":"For easing the development process for dialog systems it is desired that reusable dialog components provide pre-packaged functionality 'out-of-the-box' that enables developers to quickly build applications by providing standard default settings and behavior. Additionally, human-computer interaction should become more human-like in that mixed-initiative dialogs are supported. Mixed-initiative interaction requires the system to react to user initiated application specific commands whereby reusable dialog components have to be application independent to be used in different settings. This article presents a dialog mechanism, so called meta-dialog, which is responsible for the control flow between reusable sub-dialogs and mixed-initiative dialogs.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123534773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multispeaker speech activity detection for the ICSI meeting recorder 用于ICSI会议记录器的多扬声器语音活动检测
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034599
T. Pfau, Daniel P. W. Ellis, Andreas Stolcke
As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.
作为会议环境中语音识别项目的一部分,我们收集了多通道会议录音的语料库。考虑到参与者有单独的麦克风,我们期望扬声器活动的识别是直截了当的,但简单的方法产生了不可接受的错误标签,主要是由于附近扬声器之间的串扰和通道特性的广泛变化。因此,我们开发了一种更复杂的多通道语音活动检测方法,使用简单的隐马尔可夫模型(HMM)。将基线HMM语音活动检测器扩展到使用混合高斯函数来实现对不同说话者在不同条件下的鲁棒性。特征归一化和互相关处理用于增加信道独立性和检测串扰。使用能量归一化和基于互相关的后处理使得帧错误率相对降低了35%。语音识别实验表明,在这种多扬声器设置下,使用语音活动检测器的输出对识别器的输入进行预分割是有益的,可以将单词错误率提高到人工回合标记的10%以内。
{"title":"Multispeaker speech activity detection for the ICSI meeting recorder","authors":"T. Pfau, Daniel P. W. Ellis, Andreas Stolcke","doi":"10.1109/ASRU.2001.1034599","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034599","url":null,"abstract":"As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM). A baseline HMM speech activity detector has been extended to use mixtures of Gaussians to achieve robustness for different speakers under different conditions. Feature normalization and crosscorrelation processing are used to increase the channel independence and to detect crosstalk. The use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word error rates within 10% of those achieved with manual turn labeling.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"13 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120836563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Out-of-vocabulary word modeling using multiple lexical fillers 使用多个词汇填充符的词汇外单词建模
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034628
Gilles Boulianne, P. Dumouchel
In large vocabulary speech recognition, out-of-vocabulary words are an important cause of errors. We describe a lexical filler model that can be used in a single pass recognition system to detect out-of-vocabulary words and reduce the error rate. When rescoring word graphs with better acoustic models, word fillers cause a combinatorial explosion. We introduce a new technique, using several thousand lexical fillers, which produces word graphs that can be rescored efficiently. On a large French vocabulary continuous speech recognition task, lexical fillers achieved an OOV detection rate of 44% and allowed a 23% reduction in errors due to OOV words.
在大词汇量语音识别中,词汇外词是产生错误的重要原因。我们描述了一个词汇填充模型,该模型可用于单次识别系统中检测词汇外的单词并降低错误率。当用更好的声学模型重新记录词图时,词填充会导致组合爆炸。我们引入了一种新技术,使用数千个词汇填充器,生成可以有效恢复的词图。在一个大型法语词汇连续语音识别任务中,词汇填充物实现了44%的OOV检测率,并使由于OOV单词导致的错误减少了23%。
{"title":"Out-of-vocabulary word modeling using multiple lexical fillers","authors":"Gilles Boulianne, P. Dumouchel","doi":"10.1109/ASRU.2001.1034628","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034628","url":null,"abstract":"In large vocabulary speech recognition, out-of-vocabulary words are an important cause of errors. We describe a lexical filler model that can be used in a single pass recognition system to detect out-of-vocabulary words and reduce the error rate. When rescoring word graphs with better acoustic models, word fillers cause a combinatorial explosion. We introduce a new technique, using several thousand lexical fillers, which produces word graphs that can be rescored efficiently. On a large French vocabulary continuous speech recognition task, lexical fillers achieved an OOV detection rate of 44% and allowed a 23% reduction in errors due to OOV words.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128927600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automatic selection of transcribed training material 自动选择转录培训材料
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034673
T. Kamm, Gerald G. Meyer
Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to train, because of the high cost of annotating training data. We propose an iterative training algorithm that seeks to improve the error rate of a speech recognizer without incurring additional transcription cost, by selecting a subset of the already available transcribed training data. We apply the proposed algorithm to an alpha-digit recognition problem and reduce the error rate from 10.3% to 9.4% on a particular test set.
传统观点认为,整合更多的训练数据是降低语音识别系统错误率的最可靠方法。这反过来又保证了语音识别系统的训练成本很高,因为对训练数据进行注释的成本很高。我们提出了一种迭代训练算法,旨在通过选择已经可用的转录训练数据的子集,在不产生额外转录成本的情况下提高语音识别器的错误率。我们将该算法应用于一个字母数字识别问题,并在特定的测试集上将错误率从10.3%降低到9.4%。
{"title":"Automatic selection of transcribed training material","authors":"T. Kamm, Gerald G. Meyer","doi":"10.1109/ASRU.2001.1034673","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034673","url":null,"abstract":"Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to train, because of the high cost of annotating training data. We propose an iterative training algorithm that seeks to improve the error rate of a speech recognizer without incurring additional transcription cost, by selecting a subset of the already available transcribed training data. We apply the proposed algorithm to an alpha-digit recognition problem and reduce the error rate from 10.3% to 9.4% on a particular test set.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129054245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Grammar learning for spoken language understanding 口语理解的语法学习
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034645
Ye-Yi Wang, A. Acero
Many state-of-the-art conversational systems use semantic-based robust understanding and manually derived grammars, a very time-consuming and error-prone process. This paper describes a machine-aided grammar authoring system that enables a programmer to develop rapidly a high quality grammar for conversational systems. This is achieved with a combination of domain-specific semantics, a library grammar, syntactic constraints and a small number of example sentences that have been semantically annotated. Our experiments show that the learned semantic grammars consistently outperform manually authored grammars, requiring much less authoring load.
许多最先进的会话系统使用基于语义的健壮理解和手动派生的语法,这是一个非常耗时且容易出错的过程。本文描述了一个机器辅助语法编写系统,使程序员能够快速地为会话系统开发高质量的语法。这是通过结合特定于领域的语义、库语法、语法约束和少量经过语义注释的例句来实现的。我们的实验表明,学习的语义语法始终优于手动编写的语法,需要更少的编写负载。
{"title":"Grammar learning for spoken language understanding","authors":"Ye-Yi Wang, A. Acero","doi":"10.1109/ASRU.2001.1034645","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034645","url":null,"abstract":"Many state-of-the-art conversational systems use semantic-based robust understanding and manually derived grammars, a very time-consuming and error-prone process. This paper describes a machine-aided grammar authoring system that enables a programmer to develop rapidly a high quality grammar for conversational systems. This is achieved with a combination of domain-specific semantics, a library grammar, syntactic constraints and a small number of example sentences that have been semantically annotated. Our experiments show that the learned semantic grammars consistently outperform manually authored grammars, requiring much less authoring load.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131321451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
The symbiosis of DSP and speech recognition or an outsider's view of the inside DSP与语音识别的共生还是一个局外人的观点
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034575
J. Kaiser
From an historical review of how we got to where we are now, we discuss the interrelationship between our system design objectives and goals, our modeling of the speech signal and its generation and parameterization, and the broadly developing DSP methodology. We take a critical look at some of the underlying assumptions in. our modeling to see if they may be limiting the performance that can be obtained with ASR (automatic speech recognition) systems. We close with some open questions and challenges for new work.
从历史回顾我们如何到达现在的位置,我们讨论了我们的系统设计目标和目标之间的相互关系,我们的语音信号建模及其生成和参数化,以及广泛发展的DSP方法。我们对一些潜在的假设进行了批判性的审视。我们的建模,看看它们是否会限制ASR(自动语音识别)系统的性能。我们以一些开放的问题和新工作的挑战作为结束。
{"title":"The symbiosis of DSP and speech recognition or an outsider's view of the inside","authors":"J. Kaiser","doi":"10.1109/ASRU.2001.1034575","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034575","url":null,"abstract":"From an historical review of how we got to where we are now, we discuss the interrelationship between our system design objectives and goals, our modeling of the speech signal and its generation and parameterization, and the broadly developing DSP methodology. We take a critical look at some of the underlying assumptions in. our modeling to see if they may be limiting the performance that can be obtained with ASR (automatic speech recognition) systems. We close with some open questions and challenges for new work.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126830859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1