首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Shape vector characterization of Vietnamese tones and application to automatic recognition 越南语声调的形状向量表征及其在自动识别中的应用
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034678
Nguyen Quoc-Cuong, Pham Thi Ngoc Yen, E. Castelli
The tone recognition for Vietnamese standard language (Hanoi dialect) is described. The wavelet method is used to extract the pitch (F0) from a speech signal corpus. Thus, one feature vector for tone recognition of Vietnamese is proposed. Hidden Markov models (HMMs) are then used to recognize the tones. Our results show that tone recognition seems independent of the vowel but presents better accuracy if one of both monotonous tones is used as the pitch reference base. Finally, a first try of a completely isolated word recognition engine, adapted for Vietnamese, is presented.
介绍了越南标准语言(河内方言)的声调识别。采用小波变换从语音信号语料库中提取基音(F0)。在此基础上,提出了一种用于越南语语音识别的特征向量。然后使用隐马尔可夫模型(hmm)来识别音调。我们的研究结果表明,声调识别似乎与元音无关,但如果使用两个单调音调中的一个作为音高基准,则具有更好的准确性。最后,提出了一种适合越南语的完全孤立词识别引擎。
{"title":"Shape vector characterization of Vietnamese tones and application to automatic recognition","authors":"Nguyen Quoc-Cuong, Pham Thi Ngoc Yen, E. Castelli","doi":"10.1109/ASRU.2001.1034678","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034678","url":null,"abstract":"The tone recognition for Vietnamese standard language (Hanoi dialect) is described. The wavelet method is used to extract the pitch (F0) from a speech signal corpus. Thus, one feature vector for tone recognition of Vietnamese is proposed. Hidden Markov models (HMMs) are then used to recognize the tones. Our results show that tone recognition seems independent of the vowel but presents better accuracy if one of both monotonous tones is used as the pitch reference base. Finally, a first try of a completely isolated word recognition engine, adapted for Vietnamese, is presented.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133449587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
State synchronous modeling of audio-visual information for bi-modal speech recognition 双模态语音识别中视听信息的状态同步建模
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034671
S. Nakamura, K. Kumatani, S. Tamura
There has been a higher demand recently for automatic speech recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to integrate audio and visual information effectively in audio-visual (bi-modal) ASR systems. Such integration inevitably necessitates modeling of the synchronization of the audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on HMM composition. The proposed model can represent state synchronicity, not only within a phoneme, but also between phonemes. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech.
近年来,人们对能够在噪声环境中稳定运行的自动语音识别系统提出了更高的要求。本文提出了一种在视听(双峰)ASR系统中有效整合视听信息的方法。这种集成不可避免地需要对音频和视觉信息的同步进行建模。为了解决语音和唇部运动之间个体特征的时滞和相关性问题,我们引入了一种基于HMM组成的视听信息集成HMM建模方法。该模型不仅可以表示音素内的状态同步性,还可以表示音素之间的状态同步性。评价实验表明,该方法提高了对噪声语音的识别精度。
{"title":"State synchronous modeling of audio-visual information for bi-modal speech recognition","authors":"S. Nakamura, K. Kumatani, S. Tamura","doi":"10.1109/ASRU.2001.1034671","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034671","url":null,"abstract":"There has been a higher demand recently for automatic speech recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to integrate audio and visual information effectively in audio-visual (bi-modal) ASR systems. Such integration inevitably necessitates modeling of the synchronization of the audio and visual information. To address the time lag and correlation problems in individual features between speech and lip movements, we introduce a type of integrated HMM modeling of audio-visual information based on HMM composition. The proposed model can represent state synchronicity, not only within a phoneme, but also between phonemes. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"203 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115305092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Brancusi, neo-plasticism, and the art of designing speech-recognition applications 布朗库西,新造型主义,以及设计语音识别应用程序的艺术
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034577
B. Kotelly
Designing over-the-phone speech-recognition systems requires that designers have a design methodology and philosophy that enables them to understand how to research, design, evaluate and re-design their application.
设计电话语音识别系统需要设计师有一个设计方法和哲学,使他们能够理解如何研究,设计,评估和重新设计他们的应用程序。
{"title":"Brancusi, neo-plasticism, and the art of designing speech-recognition applications","authors":"B. Kotelly","doi":"10.1109/ASRU.2001.1034577","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034577","url":null,"abstract":"Designing over-the-phone speech-recognition systems requires that designers have a design methodology and philosophy that enables them to understand how to research, design, evaluate and re-design their application.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115687734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition 基于迭代随机逼近的递归噪声估计用于基于立体的鲁棒语音识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034594
L. Deng, J. Droppo, A. Acero
We present an algorithm for recursive estimation of parameters in a mildly nonlinear model involving incomplete data. In particular, we focus on the time-varying deterministic parameters of additive noise in the nonlinear model. For the nonstationary noise that we encounter in robust speech recognition, different observation data segments correspond to different noise parameter values. Hence, recursive estimation algorithms are more desirable than batch algorithms, since they can be designed to adaptively track the changing noise parameters. One such design based on the iterative stochastic approximation algorithm in the recursive-EM framework is described. This new algorithm jointly adapts time-varying noise parameters and the auxiliary parameters introduced to give a linear approximation of the nonlinear model. We present stereo-based robust speech recognition results for the AURORA task, which demonstrate the effectiveness of the new algorithm compared with a more traditional, MMSE noise estimation technique under otherwise identical experimental conditions.
提出了一种包含不完全数据的轻度非线性模型参数递推估计算法。特别地,我们关注非线性模型中加性噪声的时变确定性参数。对于鲁棒语音识别中遇到的非平稳噪声,不同的观测数据段对应不同的噪声参数值。因此,递归估计算法比批处理算法更可取,因为它们可以自适应地跟踪变化的噪声参数。本文描述了一种基于递归- em框架中迭代随机逼近算法的设计。该算法结合时变噪声参数和引入的辅助参数,对非线性模型进行线性逼近。我们为AURORA任务提供了基于立体声的鲁棒语音识别结果,在其他相同的实验条件下,与更传统的MMSE噪声估计技术相比,证明了新算法的有效性。
{"title":"Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition","authors":"L. Deng, J. Droppo, A. Acero","doi":"10.1109/ASRU.2001.1034594","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034594","url":null,"abstract":"We present an algorithm for recursive estimation of parameters in a mildly nonlinear model involving incomplete data. In particular, we focus on the time-varying deterministic parameters of additive noise in the nonlinear model. For the nonstationary noise that we encounter in robust speech recognition, different observation data segments correspond to different noise parameter values. Hence, recursive estimation algorithms are more desirable than batch algorithms, since they can be designed to adaptively track the changing noise parameters. One such design based on the iterative stochastic approximation algorithm in the recursive-EM framework is described. This new algorithm jointly adapts time-varying noise parameters and the auxiliary parameters introduced to give a linear approximation of the nonlinear model. We present stereo-based robust speech recognition results for the AURORA task, which demonstrate the effectiveness of the new algorithm compared with a more traditional, MMSE noise estimation technique under otherwise identical experimental conditions.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"9 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120981650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Developing the ETSI Aurora advanced distributed speech recognition front-end and what next? 开发ETSI Aurora高级分布式语音识别前端,下一步是什么?
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034605
David Pearce
The ETSI STQ-Aurora DSR working group is developing the standard for the advanced DSR front-end. One of the main goals of the advanced front-end is improved robustness to noise compared to the existing ETSI DSR standard for the Mel-cepstrum front-end. The purpose of the paper is firstly to inform the wider speech research community about this activity and then to promote discussion on what further needs there are for DSR front-end standards. The scope of the DSR standard is described and the set of performance requirements that Aurora has specified for the advanced front-end. An important part of this is the evaluation and characterisation of the performance of candidate front-ends on noisy databases, and an overview of these is given. As the competition to select the best proposal draws to a close (submission deadline 28/sup th/ November 2001) an interesting question is "What next?".
ETSI STQ-Aurora DSR工作组正在开发先进的DSR前端标准。与现有的mel -倒频谱前端ETSI DSR标准相比,先进前端的主要目标之一是提高对噪声的鲁棒性。本文的目的首先是向更广泛的语音研究界通报这一活动,然后促进对DSR前端标准的进一步需求的讨论。描述了DSR标准的范围以及Aurora为高级前端指定的一组性能要求。其中一个重要部分是对候选前端在噪声数据库上的性能进行评估和表征,并给出了这些方面的概述。随着评选最佳方案的竞赛接近尾声(提交截止日期为2001年11月28日),一个有趣的问题是“下一步是什么?”
{"title":"Developing the ETSI Aurora advanced distributed speech recognition front-end and what next?","authors":"David Pearce","doi":"10.1109/ASRU.2001.1034605","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034605","url":null,"abstract":"The ETSI STQ-Aurora DSR working group is developing the standard for the advanced DSR front-end. One of the main goals of the advanced front-end is improved robustness to noise compared to the existing ETSI DSR standard for the Mel-cepstrum front-end. The purpose of the paper is firstly to inform the wider speech research community about this activity and then to promote discussion on what further needs there are for DSR front-end standards. The scope of the DSR standard is described and the set of performance requirements that Aurora has specified for the advanced front-end. An important part of this is the evaluation and characterisation of the performance of candidate front-ends on noisy databases, and an overview of these is given. As the competition to select the best proposal draws to a close (submission deadline 28/sup th/ November 2001) an interesting question is \"What next?\".","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"53 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116385495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
An examination of three classes of ASR dialogue systems: PC-based dictation, in-car systems and automated directory assistance 测试三种类型的ASR对话系统:基于pc的听写,车载系统和自动目录辅助
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034683
M. Hunt
Three classes of practical speech recognition dialogue systems are considered, starting with PC-based systems, specifically dictation systems. Although such systems have become very effective, they have not achieved mainstream use. Some reasons for this disappointing outcome are proposed. Speech recognition is now appearing in production cars. It is argued that the two most attractive in-car applications are for navigation systems and for dialing-by-name. The latter may be more suited to equipment that can be detached from the car and connected to a PC. After considering telephone applications in general, the importance of automated DA (directory assistance - also called directory enquiries or DQ in some countries) is established and its particular challenges are discussed. Among these are the size and dynamic nature of the databases accessed, and the variations produced by callers in naming a commercial/administrative entity whose number they are seeking. The advantages of a bottom-up phonetic speech recognition technique for automated DA are described. It is concluded that the combination of this technique and automatic methods for handling name variation makes automated DA, including access to business listings, a practical proposition.
本文考虑了三类实用的语音识别对话系统,首先是基于pc的系统,特别是听写系统。虽然这些系统已经变得非常有效,但它们还没有达到主流使用。对这一令人失望的结果提出了一些原因。语音识别现在已经出现在量产汽车上。有人认为,两种最有吸引力的车载应用是导航系统和按名拨号。后者可能更适合于可以从汽车上分离并连接到个人电脑的设备。在考虑了一般的电话应用程序之后,确定了自动DA(目录协助-在某些国家也称为目录查询或DQ)的重要性,并讨论了其特殊的挑战。其中包括所访问数据库的大小和动态性质,以及呼叫者在命名他们正在寻找的商业/行政实体时所产生的变化。描述了自底向上语音识别技术在自动数据处理中的优点。结论是,将该技术与处理名称变化的自动方法相结合,使自动数据处理(包括访问企业列表)成为一个实用的建议。
{"title":"An examination of three classes of ASR dialogue systems: PC-based dictation, in-car systems and automated directory assistance","authors":"M. Hunt","doi":"10.1109/ASRU.2001.1034683","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034683","url":null,"abstract":"Three classes of practical speech recognition dialogue systems are considered, starting with PC-based systems, specifically dictation systems. Although such systems have become very effective, they have not achieved mainstream use. Some reasons for this disappointing outcome are proposed. Speech recognition is now appearing in production cars. It is argued that the two most attractive in-car applications are for navigation systems and for dialing-by-name. The latter may be more suited to equipment that can be detached from the car and connected to a PC. After considering telephone applications in general, the importance of automated DA (directory assistance - also called directory enquiries or DQ in some countries) is established and its particular challenges are discussed. Among these are the size and dynamic nature of the databases accessed, and the variations produced by callers in naming a commercial/administrative entity whose number they are seeking. The advantages of a bottom-up phonetic speech recognition technique for automated DA are described. It is concluded that the combination of this technique and automatic methods for handling name variation makes automated DA, including access to business listings, a practical proposition.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116704556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semantic modeling for dialog systems in a pattern recognition framework 模式识别框架下对话系统的语义建模
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034643
Kuansan Wang
In this paper, we describe a multimodal dialog system based on the pattern recognition framework that has been successfully applied to automatic speech recognition. We treat the dialog problem as to recognize the optimal action based on the user's input and context. Analogous to the acoustic, pronunciation, and language models for speech recognition, the dialog system in this framework has language, semantic, and behavior models to take into account when it searches for the best result. The paper focuses on our approaches in semantic modeling, describing how semantic lexicon and domain knowledge are derived and integrated. We show that, once semantic abstraction is introduced, multimodal integration can be achieved using the reference resolution algorithm developed for natural language understanding. Several applications developed to test various aspects of the proposed framework are also described.
本文描述了一种基于模式识别框架的多模态对话系统,该系统已成功应用于语音自动识别。我们将对话问题视为基于用户输入和上下文来识别最佳动作。与语音识别的声学、发音和语言模型类似,该框架中的对话系统在搜索最佳结果时要考虑语言、语义和行为模型。本文重点介绍了我们在语义建模方面的方法,描述了语义词汇和领域知识是如何派生和集成的。我们表明,一旦引入语义抽象,就可以使用为自然语言理解而开发的参考解析算法实现多模态集成。还描述了为测试所建议框架的各个方面而开发的几个应用程序。
{"title":"Semantic modeling for dialog systems in a pattern recognition framework","authors":"Kuansan Wang","doi":"10.1109/ASRU.2001.1034643","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034643","url":null,"abstract":"In this paper, we describe a multimodal dialog system based on the pattern recognition framework that has been successfully applied to automatic speech recognition. We treat the dialog problem as to recognize the optimal action based on the user's input and context. Analogous to the acoustic, pronunciation, and language models for speech recognition, the dialog system in this framework has language, semantic, and behavior models to take into account when it searches for the best result. The paper focuses on our approaches in semantic modeling, describing how semantic lexicon and domain knowledge are derived and integrated. We show that, once semantic abstraction is introduced, multimodal integration can be achieved using the reference resolution algorithm developed for natural language understanding. Several applications developed to test various aspects of the proposed framework are also described.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125726247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A one-pass decoder based on polymorphic linguistic context assignment 一种基于多态语言上下文赋值的一次译码器
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034625
H. Soltau, Florian Metze, C. Fugen, A. Waibel
In this study, we examine how fast decoding of conversational speech with large vocabularies profits from efficient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation prefix tree, we use the concept of linguistic context polymorphism to allow an early incorporation of language model information. This approach allows us to use all available language model information in a one-pass decoder, using the same engine to decode with statistical n-gram language models as well as context free grammars or re-scoring of lattices in an efficient way. We compare this approach to our previous decoder, which needed three passes to incorporate all available information. The results on a very large vocabulary task show that the search can be speeded up by almost a factor of three, without introducing additional search errors.
在这项研究中,我们研究了如何快速解码具有大词汇量的会话语音受益于语言信息的有效使用,即语言模型和语法。基于可重入的单一发音前缀树,我们使用语言上下文多态性的概念来允许早期合并语言模型信息。这种方法允许我们在一次解码器中使用所有可用的语言模型信息,使用相同的引擎来解码统计n-gram语言模型以及上下文无关的语法或以有效的方式重新评分格。我们将此方法与之前的解码器进行比较,之前的解码器需要三次传递才能合并所有可用信息。在一个非常大的词汇量任务上的结果表明,在不引入额外的搜索错误的情况下,搜索速度几乎可以提高三倍。
{"title":"A one-pass decoder based on polymorphic linguistic context assignment","authors":"H. Soltau, Florian Metze, C. Fugen, A. Waibel","doi":"10.1109/ASRU.2001.1034625","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034625","url":null,"abstract":"In this study, we examine how fast decoding of conversational speech with large vocabularies profits from efficient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation prefix tree, we use the concept of linguistic context polymorphism to allow an early incorporation of language model information. This approach allows us to use all available language model information in a one-pass decoder, using the same engine to decode with statistical n-gram language models as well as context free grammars or re-scoring of lattices in an efficient way. We compare this approach to our previous decoder, which needed three passes to incorporate all available information. The results on a very large vocabulary task show that the search can be speeded up by almost a factor of three, without introducing additional search errors.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124587184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 240
Robust analysis of spoken input combining statistical and knowledge-based information sources 结合统计和以知识为基础的信息源的口语输入强有力的分析
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034658
R. Cattoni, Marcello Federico, A. Lavie
The paper is concerned with the analysis of automatic transcription of spoken input into an interlingua formalism for a speech-to-speech machine translation system. This process is based on two sub-tasks: (1) the recognition of the domain action (a speech act and a sequence of concepts); (2) the extraction of arguments consisting of feature-value information. Statistical models are used for the former, while a knowledge-based approach is employed for the latter. The paper proposes an algorithm that improves the analysis in terms of robustness and performance; it combines the scores of the statistical models with the extracted arguments, taking into account the well-formedness constraints defined by the interlingua formalism.
本文研究了语音对语音机器翻译系统中语音输入的自动转写问题。该过程基于两个子任务:(1)识别领域动作(一个语音行为和一系列概念);(2)由特征值信息组成的参数提取。前者采用统计模型,后者采用基于知识的方法。本文提出了一种从鲁棒性和性能两方面改进分析的算法;它将统计模型的分数与提取的参数结合起来,考虑到由语言间形式主义定义的格式良好性约束。
{"title":"Robust analysis of spoken input combining statistical and knowledge-based information sources","authors":"R. Cattoni, Marcello Federico, A. Lavie","doi":"10.1109/ASRU.2001.1034658","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034658","url":null,"abstract":"The paper is concerned with the analysis of automatic transcription of spoken input into an interlingua formalism for a speech-to-speech machine translation system. This process is based on two sub-tasks: (1) the recognition of the domain action (a speech act and a sequence of concepts); (2) the extraction of arguments consisting of feature-value information. Statistical models are used for the former, while a knowledge-based approach is employed for the latter. The paper proposes an algorithm that improves the analysis in terms of robustness and performance; it combines the scores of the statistical models with the extracted arguments, taking into account the well-formedness constraints defined by the interlingua formalism.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132428357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Some experiments on the use of one-channel noise reduction techniques with the Italian SpeechDat Car database 在意大利语语音数据库中使用单通道降噪技术的一些实验
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034607
M. Matassoni, G. Mian, M. Omologo, A. Santarelli, P. Svaizer
The use of noise reduction techniques for hands-free speech recognition in a car environment is investigated. A set of experiments was carried out using different speech enhancement algorithms based on noise estimation. In particular, linear spectral subtraction and MMSE estimators are considered with various parameter settings. Experiments were conducted on connected and isolated digits, extracted from the Italian version of the SpeechDat Car database. Recognition rates do not agree with acoustically perceived quality of noise reduction. As a result, the best performance is obtained by spectral subtraction with a suitable choice of the oversubtraction factor and a quantile noise estimator. It provides more than 30% relative performance improvement, from 94.4% of the baseline to 96.2% digit recognition accuracy.
研究了在汽车环境中使用降噪技术进行免提语音识别。采用不同的基于噪声估计的语音增强算法进行了一组实验。特别是考虑了不同参数设置下的线性谱减法和MMSE估计。实验是在连接和孤立的数字上进行的,这些数字是从意大利语版的语音数据库中提取的。识别率与声学感知的降噪质量不一致。结果表明,在适当选择过减因子和分位数噪声估计器的情况下,谱减法可以获得最佳性能。它提供了超过30%的相对性能改进,从基线的94.4%到96.2%的数字识别准确率。
{"title":"Some experiments on the use of one-channel noise reduction techniques with the Italian SpeechDat Car database","authors":"M. Matassoni, G. Mian, M. Omologo, A. Santarelli, P. Svaizer","doi":"10.1109/ASRU.2001.1034607","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034607","url":null,"abstract":"The use of noise reduction techniques for hands-free speech recognition in a car environment is investigated. A set of experiments was carried out using different speech enhancement algorithms based on noise estimation. In particular, linear spectral subtraction and MMSE estimators are considered with various parameter settings. Experiments were conducted on connected and isolated digits, extracted from the Italian version of the SpeechDat Car database. Recognition rates do not agree with acoustically perceived quality of noise reduction. As a result, the best performance is obtained by spectral subtraction with a suitable choice of the oversubtraction factor and a quantile noise estimator. It provides more than 30% relative performance improvement, from 94.4% of the baseline to 96.2% digit recognition accuracy.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132642516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1