首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Language modeling for multi-domain speech-driven text retrieval 多领域语音驱动文本检索的语言建模
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034653
K. Itou, Atsushi Fujii, Tetsuya Ishikawa
We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test collections combined with dictated queries showed the effectiveness of our method.
我们报告了与语音驱动文本检索相关的实验结果,该检索有助于通过语音查询检索多个领域的信息。由于用户说的是与目标集合相关的内容,我们基于目标集合生成用于语音识别的语言模型,从而提高识别和检索的准确性。使用现有测试集合和口述查询进行的实验显示了我们的方法的有效性。
{"title":"Language modeling for multi-domain speech-driven text retrieval","authors":"K. Itou, Atsushi Fujii, Tetsuya Ishikawa","doi":"10.1109/ASRU.2001.1034653","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034653","url":null,"abstract":"We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test collections combined with dictated queries showed the effectiveness of our method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130231520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Very large vocabulary proper name recognition for directory assistance 非常大的词汇,适当的名称识别目录协助
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034627
F. Béchet, R. de Mori, G. Subsol
This paper deals with the difficult task of recognition of a large vocabulary of proper names in a directory assistance application. After a presentation of the related work, it introduces a methodology for rescoring the N-best hypotheses generated by a first step recognition. First experiments give encouraging results and several topics for future research are presented.
本文研究了在目录帮助应用程序中对大量专有名称进行识别的难题。在介绍了相关工作之后,介绍了一种方法,用于重新记录由第一步识别生成的n个最佳假设。初步实验结果令人鼓舞,并提出了未来研究的几个课题。
{"title":"Very large vocabulary proper name recognition for directory assistance","authors":"F. Béchet, R. de Mori, G. Subsol","doi":"10.1109/ASRU.2001.1034627","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034627","url":null,"abstract":"This paper deals with the difficult task of recognition of a large vocabulary of proper names in a directory assistance application. After a presentation of the related work, it introduces a methodology for rescoring the N-best hypotheses generated by a first step recognition. First experiments give encouraging results and several topics for future research are presented.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116832540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Investigating stochastic speech understanding 随机语音理解研究
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034637
H. Bonneau-Maynard, F. Lefèvre
The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.
在语音理解系统的开发中,对人类专业知识的需求可以通过使用随机技术大大减少。然而,基于语料库的技术需要对大量的训练数据进行标注。这种语料库的手动语义注释是乏味的、昂贵的,并且容易出现不一致。本文研究了训练语料库大小对理解模块性能的影响。本文还研究了自动标注数据作为一种以极低成本增加语料库大小的方法。首先,提出了一个随机语音理解模型,该模型是利用LIMSI ARISE对话系统收集的数据开发的。它的性能可以与系统中目前使用的基于规则的caseframe语法相媲美。在第二步中,寻求两种降低开发成本的方法:(1)减少用于训练随机模型的手动注释数据的数量;(2)在训练过程中使用自动注释数据。
{"title":"Investigating stochastic speech understanding","authors":"H. Bonneau-Maynard, F. Lefèvre","doi":"10.1109/ASRU.2001.1034637","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034637","url":null,"abstract":"The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the performance of the understanding module. The use of automatically annotated data is also investigated as a means to increase the corpus size at a very low cost. First, a stochastic speech understanding model developed using data collected with the LIMSI ARISE dialog system is presented. Its performance is shown to be comparable to that of the rule-based caseframe grammar currently used in the system. In a second step, two ways of reducing the development cost are pursued: (1) reducing of the amount of manually annotated data used to train the stochastic models and (2) using automatically annotated data in the training process.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114767318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Speaker-trained recognition using allophonic enrollment models 使用语音注册模型的说话人训练识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034589
V. Yanhoucke, M. Hochberg, C. Leggetter
We introduce a method for performing speaker-trained recognition based on context-dependent allophone models from a large-vocabulary, speaker-independent recognition system. A set of speaker-enrollment templates is selected from the context-dependent allophone models. These templates are used to build representations of the speaker-enrolled utterances. The advantages of this approach include improved performance and portability of the enrollments across different acoustic models. We describe the approach used to select the enrollment templates and how to apply them to speaker-trained recognition. The approach has been evaluated on an over-the-telephone, voice-activated dialing task and shows significant performance improvements over techniques based on context-independent phone models or general acoustic model templates. In addition, the portability of enrollments from one model set to another is shown to result in almost no performance degradation.
我们介绍了一种基于上下文相关的说话人训练识别方法,该方法来自一个大词汇量、说话人独立的识别系统。从上下文相关的变体模型中选择一组说话人注册模板。这些模板用于构建说话人登记的话语的表示。这种方法的优点包括改进性能和跨不同声学模型登记的可移植性。我们描述了用于选择注册模板的方法,以及如何将它们应用于演讲者训练的识别。该方法已在电话语音激活拨号任务中进行了评估,与基于上下文无关的电话模型或一般声学模型模板的技术相比,显示出显著的性能改进。此外,从一个模型集登记到另一个模型集的可移植性几乎不会导致性能下降。
{"title":"Speaker-trained recognition using allophonic enrollment models","authors":"V. Yanhoucke, M. Hochberg, C. Leggetter","doi":"10.1109/ASRU.2001.1034589","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034589","url":null,"abstract":"We introduce a method for performing speaker-trained recognition based on context-dependent allophone models from a large-vocabulary, speaker-independent recognition system. A set of speaker-enrollment templates is selected from the context-dependent allophone models. These templates are used to build representations of the speaker-enrolled utterances. The advantages of this approach include improved performance and portability of the enrollments across different acoustic models. We describe the approach used to select the enrollment templates and how to apply them to speaker-trained recognition. The approach has been evaluated on an over-the-telephone, voice-activated dialing task and shows significant performance improvements over techniques based on context-independent phone models or general acoustic model templates. In addition, the portability of enrollments from one model set to another is shown to result in almost no performance degradation.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128736530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
High performance telephone bandwidth speaker independent continuous digit recognition 高性能电话带宽扬声器独立连续数字识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034670
P. Cosi, J.-P. Hosoma, A. Valente
The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system architectures were compared and, when the best features (MFCC with CMS + /spl Delta/) and network (4-layer fully connected feed-forward network) were considered, there was a 98.92% word recognition accuracy and a 92.62% sentence recognition accuracy on a test set of the FIELD continuous digits recognition task.
介绍了一种高性能电话带宽扬声器独立连接数字识别器的研制。CSLU语音工具包用于开发和实现混合ANN/HMM系统,该系统在上下文相关类别上进行训练,以解释协同发音变化。通过对各种前端处理和系统架构的比较,在考虑最佳特征(MFCC + /spl Delta/)和网络(4层全连接前馈网络)的情况下,FIELD连续数字识别任务的测试集的单词识别准确率为98.92%,句子识别准确率为92.62%。
{"title":"High performance telephone bandwidth speaker independent continuous digit recognition","authors":"P. Cosi, J.-P. Hosoma, A. Valente","doi":"10.1109/ASRU.2001.1034670","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034670","url":null,"abstract":"The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system architectures were compared and, when the best features (MFCC with CMS + /spl Delta/) and network (4-layer fully connected feed-forward network) were considered, there was a 98.92% word recognition accuracy and a 92.62% sentence recognition accuracy on a test set of the FIELD continuous digits recognition task.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116596695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Pseudo 2-dimensional hidden Markov models in speech recognition 语音识别中的伪二维隐马尔可夫模型
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034679
S. Werner, G. Rigoll
In this paper, the usage of pseudo 2-dimensional hidden Markov models for speech recognition is discussed. This image processing method should better model the time-frequency structure in speech signals. The method calculates the emission probability of a standard HMM by embedded HMM for each state. If a temporal sequence of spectral vectors is imagined as a spectrogram, this leads to a 2-dimensional warping of the spectrogram. This additional warping of the frequency axis could be useful for speaker-independent recognition and can be considered to be similar to a vocal tract normalization. The effects of this paradigm are investigated in this paper using the TI-Digits database.
本文讨论了伪二维隐马尔可夫模型在语音识别中的应用。这种图像处理方法能够更好地模拟语音信号的时频结构。该方法通过嵌入HMM计算每个状态下标准HMM的发射概率。如果频谱矢量的时间序列被想象为频谱图,这将导致频谱图的二维翘曲。这种频率轴的额外扭曲可能对独立于说话人的识别有用,并且可以被认为类似于声道归一化。本文利用TI-Digits数据库对这种范式的影响进行了研究。
{"title":"Pseudo 2-dimensional hidden Markov models in speech recognition","authors":"S. Werner, G. Rigoll","doi":"10.1109/ASRU.2001.1034679","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034679","url":null,"abstract":"In this paper, the usage of pseudo 2-dimensional hidden Markov models for speech recognition is discussed. This image processing method should better model the time-frequency structure in speech signals. The method calculates the emission probability of a standard HMM by embedded HMM for each state. If a temporal sequence of spectral vectors is imagined as a spectrogram, this leads to a 2-dimensional warping of the spectrogram. This additional warping of the frequency axis could be useful for speaker-independent recognition and can be considered to be similar to a vocal tract normalization. The effects of this paradigm are investigated in this paper using the TI-Digits database.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127667310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Task-specific adaptation of speech recognition models 特定任务的语音识别模型自适应
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034677
A. Sankar, Ashvin Kannan, B. Shahshahani, E. Jackson
Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. We study acoustic, grammar, and combined acoustic and grammar adaptation for creating task-specific recognition models. Comprehensive experimental results are presented using data from natural language quotes and a trading application. The results show that task adaptation gives substantial improvements in both utterance understanding accuracy, and recognition speed.
大多数已发表的适应研究主要集中在说话人的适应,以及对噪声信道和背景环境的适应。我们研究声学、语法,并结合声学和语法适应来创建特定任务的识别模型。使用自然语言报价和交易应用程序的数据给出了综合实验结果。结果表明,任务自适应在语音理解准确率和识别速度上都有显著提高。
{"title":"Task-specific adaptation of speech recognition models","authors":"A. Sankar, Ashvin Kannan, B. Shahshahani, E. Jackson","doi":"10.1109/ASRU.2001.1034677","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034677","url":null,"abstract":"Most published adaptation research focuses on speaker adaptation, and on adaptation for noisy channels and background environments. We study acoustic, grammar, and combined acoustic and grammar adaptation for creating task-specific recognition models. Comprehensive experimental results are presented using data from natural language quotes and a trading application. The results show that task adaptation gives substantial improvements in both utterance understanding accuracy, and recognition speed.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117051335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An online model adaptation method for compensating speech models for noise in continuous speech recognition 连续语音识别中语音模型噪声补偿的在线模型自适应方法
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034609
R. Lee, E. Choi
This paper presents a method for online model adaptation based on the parallel model combination (PMC) method. The proposed method makes use of the concept of Gaussian model clustering to reduce the computation load required by PMC. This model clustering, in combination with a set of derived transformation equations, provide a potential framework for online model adaptation in noisy speech recognition. The proposed method reduces the computation in adaptation by about 45% with only a slight degradation in improvements of an average 18% for a connected digit task and 9% for a large vocabulary Mandarin task when compared with standard PMC method.
提出了一种基于并行模型组合(PMC)方法的在线模型自适应方法。该方法利用高斯模型聚类的概念,减少了PMC的计算量。该模型聚类与一组派生的变换方程相结合,为噪声语音识别中的在线模型自适应提供了一个潜在的框架。与标准PMC方法相比,该方法减少了约45%的自适应计算量,在连接数字任务和大词汇量普通话任务上的平均改进仅为18%和9%。
{"title":"An online model adaptation method for compensating speech models for noise in continuous speech recognition","authors":"R. Lee, E. Choi","doi":"10.1109/ASRU.2001.1034609","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034609","url":null,"abstract":"This paper presents a method for online model adaptation based on the parallel model combination (PMC) method. The proposed method makes use of the concept of Gaussian model clustering to reduce the computation load required by PMC. This model clustering, in combination with a set of derived transformation equations, provide a potential framework for online model adaptation in noisy speech recognition. The proposed method reduces the computation in adaptation by about 45% with only a slight degradation in improvements of an average 18% for a connected digit task and 9% for a large vocabulary Mandarin task when compared with standard PMC method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117113175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An open concept metric for assessing dialog system complexity 用于评估对话系统复杂性的开放概念度量
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034638
T. M. DuBois, Alexander I. Rudnicky
Techniques for assessing dialog system performance commonly focus on characteristics of the interaction, using metrics such as completion, satisfaction or time on task. However, such metrics are not always capable of differentiating systems that operate on fundamentally different principles, particularly when tested on tasks that focus on common-denominator capabilities. We introduce a new metric, the open concept count, and show how it can be used to capture useful system properties of a dialog system.
评估对话系统性能的技术通常关注交互的特征,使用诸如完成度、满意度或任务时间等指标。然而,这样的度量标准并不总是能够区分在根本不同的原则上运行的系统,特别是在对集中于共同功能的任务进行测试时。我们引入了一个新的度量,开放概念计数,并展示了如何使用它来捕获对话系统的有用系统属性。
{"title":"An open concept metric for assessing dialog system complexity","authors":"T. M. DuBois, Alexander I. Rudnicky","doi":"10.1109/ASRU.2001.1034638","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034638","url":null,"abstract":"Techniques for assessing dialog system performance commonly focus on characteristics of the interaction, using metrics such as completion, satisfaction or time on task. However, such metrics are not always capable of differentiating systems that operate on fundamentally different principles, particularly when tested on tasks that focus on common-denominator capabilities. We introduce a new metric, the open concept count, and show how it can be used to capture useful system properties of a dialog system.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Markovian combination of language and prosodic models for better speech understanding and recognition 语言和韵律模型的马尔可夫组合,以更好地理解和识别语音
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034615
A. Stolcke, Elizabeth Shriberg
Summary form only given. Traditionally, "language" models capture only the word sequences of a language. A crucial component of spoken language, however is its prosody, i.e., rhythmic and melodic properties. This paper summarizes recent work on integrated, computationally efficient modeling of word sequences and prosodic properties of speech, for a variety of speech recognition and understanding tasks, such as dialog act tagging, disfluency detection, and segmentation into sentences and topics. In each case it turns out that hidden Markov representations of the underlying structures and associated observations arise naturally, and allow existing speech recognizers to be combined with separately trained prosodic classifiers. The same HMM-based models can be used in two modes: to recover hidden structure (such as sentence boundaries), or to evaluate speech recognition hypotheses, thereby integrating prosody into the recognition process.
只提供摘要形式。传统上,“语言”模型只捕获语言的单词序列。然而,口语的一个重要组成部分是韵律,即节奏和旋律的特性。本文总结了最近在综合的、计算高效的词序列和语音韵律特性建模方面的工作,用于各种语音识别和理解任务,如对话行为标记、不流畅检测和句子和主题分割。在每一种情况下,基础结构和相关观察的隐藏马尔可夫表示都是自然产生的,并允许现有的语音识别器与单独训练的韵律分类器相结合。相同的基于hmm的模型可以用于两种模式:恢复隐藏结构(如句子边界),或评估语音识别假设,从而将韵律整合到识别过程中。
{"title":"Markovian combination of language and prosodic models for better speech understanding and recognition","authors":"A. Stolcke, Elizabeth Shriberg","doi":"10.1109/ASRU.2001.1034615","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034615","url":null,"abstract":"Summary form only given. Traditionally, \"language\" models capture only the word sequences of a language. A crucial component of spoken language, however is its prosody, i.e., rhythmic and melodic properties. This paper summarizes recent work on integrated, computationally efficient modeling of word sequences and prosodic properties of speech, for a variety of speech recognition and understanding tasks, such as dialog act tagging, disfluency detection, and segmentation into sentences and topics. In each case it turns out that hidden Markov representations of the underlying structures and associated observations arise naturally, and allow existing speech recognizers to be combined with separately trained prosodic classifiers. The same HMM-based models can be used in two modes: to recover hidden structure (such as sentence boundaries), or to evaluate speech recognition hypotheses, thereby integrating prosody into the recognition process.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115924796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1