首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals 使用8千赫和16千赫采样信号的speech - dat - car Aurora西班牙数据库进行识别实验
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034606
C. Nadeu, M. Tolos
Like the other SpeechDat-Car databases, the Spanish one has been collected using a 16 kHz sampling frequency, and several microphone positions and environmental noises. We aim at clarifying whether there is any advantage in terms of recognition performance from processing the 16 kHz-sampled signals instead of the usual 8 kHz-sampled ones. Recognition tests have been carried out within the Aurora experimental framework, which includes signals from both a close-talking microphone and a distant microphone. Our preliminary results indicate that it is possible to get a performance improvement from the increased bandwidth in the noisy car environment.
和其他的speech - dat - car数据库一样,西班牙语数据库使用16千赫的采样频率,以及几个麦克风位置和环境噪声来收集数据。我们的目的是澄清处理16 khz采样信号而不是通常的8 khz采样信号在识别性能方面是否有任何优势。识别测试已经在极光实验框架内进行,其中包括来自近距离通话麦克风和远距离麦克风的信号。我们的初步结果表明,在嘈杂的汽车环境中,带宽的增加可能会提高性能。
{"title":"Recognition experiments with the SpeechDat-Car Aurora Spanish database using 8 kHz- and 16 kHz-sampled signals","authors":"C. Nadeu, M. Tolos","doi":"10.1109/ASRU.2001.1034606","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034606","url":null,"abstract":"Like the other SpeechDat-Car databases, the Spanish one has been collected using a 16 kHz sampling frequency, and several microphone positions and environmental noises. We aim at clarifying whether there is any advantage in terms of recognition performance from processing the 16 kHz-sampled signals instead of the usual 8 kHz-sampled ones. Recognition tests have been carried out within the Aurora experimental framework, which includes signals from both a close-talking microphone and a distant microphone. Our preliminary results indicate that it is possible to get a performance improvement from the increased bandwidth in the noisy car environment.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116862653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation 通过语音到语音的翻译,将麦克风阵列和摄像机协同导向多语种电话会议
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034602
T. Nishiura, R. Gruhn, S. Nakamura
It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by steering a microphone array and a video camera in the speaker direction. However, to realize automatic steering, it is necessary to localize the talker. To overcome this problem, we propose collaborative steering of the microphone array and the video camera in real-time for a multilingual teleconference through speech-to-speech translation. We conducted experiments in a real room environment. The speaker localization rate (i.e., speaker image capturing rate) was 97.7%, speech recognition rate was 90.0%, and TOEIC score was 530/spl sim/540 points, subject to locating the speaker at a 2.0 meter distance from the microphone array.
通过语音到语音翻译捕获高质量的远程通话语音是多语种远程会议的重要内容。此外,在这样的会议中,演讲者的形象也需要实现自然的交流。麦克风阵列是捕捉远距离通话语音的理想选择。通过将麦克风阵列和摄像机转向扬声器方向,可以增强发出的语音,并捕获扬声器图像。然而,要实现自动转向,必须对说话人进行定位。为了克服这一问题,我们提出了通过语音到语音的翻译,实时协作控制麦克风阵列和摄像机,用于多语言电话会议。我们在真实的室内环境中进行了实验。扬声器定位率(即扬声器图像捕获率)为97.7%,语音识别率为90.0%,TOEIC分数为530/spl sim/540分,前提是将扬声器定位在距离麦克风阵列2.0 m的位置。
{"title":"Collaborative steering of microphone array and video camera toward multi-lingual tele-conference through speech-to-speech translation","authors":"T. Nishiura, R. Gruhn, S. Nakamura","doi":"10.1109/ASRU.2001.1034602","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034602","url":null,"abstract":"It is very important for multilingual teleconferencing through speech-to-speech translation to capture distant-talking speech with high quality. In addition, the speaker image is also needed to realize a natural communication in such a conference. A microphone array is an ideal candidate for capturing distant-talking speech. Uttered speech can be enhanced and speaker images can be captured by steering a microphone array and a video camera in the speaker direction. However, to realize automatic steering, it is necessary to localize the talker. To overcome this problem, we propose collaborative steering of the microphone array and the video camera in real-time for a multilingual teleconference through speech-to-speech translation. We conducted experiments in a real room environment. The speaker localization rate (i.e., speaker image capturing rate) was 97.7%, speech recognition rate was 90.0%, and TOEIC score was 530/spl sim/540 points, subject to locating the speaker at a 2.0 meter distance from the microphone array.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115515097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Transducer composition for "on-the-fly" lexicon and language model integration 用于“即时”词汇和语言模型集成的换能器组成
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034667
D. Caseiro, I. Trancoso
We present the use of a specialized composition algorithm that allows the generation of a determinized search network for ASR in a single step. The algorithm is exact in the sense that the result is determinized when the lexicon and the language model are represented as determinized transducers. The composition and determinization are performed simultaneously, which is of great importance for "on-the-fly" operation. The algorithm pushes the language model weights towards the initial state of the network. Our results show that it is advantageous to use the maximum amount of information as early as possible in the decoding procedure.
我们提出了一种专门的组合算法,该算法允许在单个步骤中生成确定的ASR搜索网络。该算法是精确的,因为当词典和语言模型被表示为确定的换能器时,结果是确定的。组成和测定同时进行,这对“即时”操作具有重要意义。该算法将语言模型的权重向网络的初始状态推进。我们的研究结果表明,在解码过程中尽可能早地使用最大数量的信息是有利的。
{"title":"Transducer composition for \"on-the-fly\" lexicon and language model integration","authors":"D. Caseiro, I. Trancoso","doi":"10.1109/ASRU.2001.1034667","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034667","url":null,"abstract":"We present the use of a specialized composition algorithm that allows the generation of a determinized search network for ASR in a single step. The algorithm is exact in the sense that the result is determinized when the lexicon and the language model are represented as determinized transducers. The composition and determinization are performed simultaneously, which is of great importance for \"on-the-fly\" operation. The algorithm pushes the language model weights towards the initial state of the network. Our results show that it is advantageous to use the maximum amount of information as early as possible in the decoding procedure.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123644859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Searching for the missing piece [speech recognition] 寻找丢失的碎片[语音识别]
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034629
W. N. Choi, Y. W. Wong, T. Lee, P. Ching
The tree-trellis forward-backward algorithm has been widely used for N-best searching in continuous speech recognition. In conventional approaches, the heuristic score used for the A* backward search is derived from the partial-path scores recorded during the forward pass. The inherently delayed use of a language model in the lexical tree structure leads to inefficient pruning and the partial-path score recorded is an underestimated heuristic score. This paper presents a novel method of computing the heuristic score that is more accurate than the partial-path score. The goal is to recover high-score sentence hypotheses that may have been pruned halfway during the forward search due to the delayed use of the LM. For the application of Hong Kong stock information inquiries, the proposed technique shows a noticeable performance improvement. In particular, a relative error-rate reduction of 12% has been achieved for top-1 sentences.
在连续语音识别中,树格前向后向算法被广泛应用于n -最优搜索。在传统方法中,用于A*向后搜索的启发式分数是从向前传递期间记录的部分路径分数中派生出来的。在词汇树结构中,语言模型固有的延迟使用导致了低效的修剪,并且记录的部分路径分数是一个被低估的启发式分数。本文提出了一种计算启发式分数的新方法,该方法比部分路径分数更准确。目标是恢复高分句子假设,这些假设可能在前向搜索过程中由于LM的延迟使用而被中途修剪。对于香港股票信息查询的应用,所提出的技术显示出明显的性能改进。特别是,排名前1的句子的相对错误率降低了12%。
{"title":"Searching for the missing piece [speech recognition]","authors":"W. N. Choi, Y. W. Wong, T. Lee, P. Ching","doi":"10.1109/ASRU.2001.1034629","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034629","url":null,"abstract":"The tree-trellis forward-backward algorithm has been widely used for N-best searching in continuous speech recognition. In conventional approaches, the heuristic score used for the A* backward search is derived from the partial-path scores recorded during the forward pass. The inherently delayed use of a language model in the lexical tree structure leads to inefficient pruning and the partial-path score recorded is an underestimated heuristic score. This paper presents a novel method of computing the heuristic score that is more accurate than the partial-path score. The goal is to recover high-score sentence hypotheses that may have been pruned halfway during the forward search due to the delayed use of the LM. For the application of Hong Kong stock information inquiries, the proposed technique shows a noticeable performance improvement. In particular, a relative error-rate reduction of 12% has been achieved for top-1 sentences.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129093692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm 基于三维n -最优搜索算法的远距离多声源语音同时识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034600
P. Heracleous, S. Nakamura, K. Shikano
This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.
本文研究了基于三维n -最优搜索算法的多人远程语音同时识别问题。我们描述了三维n -最佳搜索的基本思想,并讨论了在基线系统中实现的两种附加技术。也就是说,基于路径距离的聚类和似然归一化技术似乎是必要的,以便为我们的目的建立一个有效的系统。在以前的工作中,我们介绍了在模拟数据上进行的实验结果。本文介绍了利用混响数据进行的实验结果。混响数据是用图像法模拟并在真实房间中记录的数据。采用图像法求出精度与混响时间的关系,并用实际数据对算法的实际性能进行评价。在混响时间为162 ms时,采用图像法,获得的同时单词正确率前3名为73.02%。
{"title":"Simultaneous recognition of distant talking speech of multiple sound sources based on 3-D N-best search algorithm","authors":"P. Heracleous, S. Nakamura, K. Shikano","doi":"10.1109/ASRU.2001.1034600","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034600","url":null,"abstract":"This paper deals with the simultaneous recognition of distant-talking speech of multiple talkers using the 3D N-best search algorithm. We describe the basic idea of the 3D N-best search and we address two additional techniques implemented into the baseline system. Namely, a path distance-based clustering and a likelihood normalization technique appeared to be necessary in order to build an efficient system for our purpose. In previous works we introduced the results of experiments carried out on simulated data. In this paper we introduce the results of the experiments carried out using reverberated data. The reverberated data are those simulated by the image method and recorded in a real room. The image method was used to find out the accuracy-reverberation time relationship, and the real data was used to evaluate the real performance of our algorithm. The obtained Top 3 results of the simultaneous word accuracy was 73.02% under 162 ms reverberation time and using the image method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128772959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Acoustic analysis and recognition of whispered speech 低声语音的声学分析与识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034676
Taisuke Itoh, K. Takeda, F. Itakura
The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes; 2) the spectral tilt of whispered speech is less sloped than for normal speech; 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR (maximum likelihood linear regression) adaptation is applied, while the normal speech HMMs adapted with whispered speech attain only 56% syllable accuracy.
讨论了低声语音的声学特性及其识别方法。建立了由100位说话者的6000多句耳语语音、正常语音和相应的面部视频图像组成的耳语语音数据库。低声语音与正常语音的对比表明:1)浊音音素与正常语音的倒谱距离为4 dB,浊音音素与正常语音的倒谱距离为2 dB;2)低语速语音的频谱倾斜度小于正常语速;3)低共振峰(低于1.5 kHz)的频率低于正常语音的频率。由语音数据库训练的声学模型在音节识别实验中达到了60%的准确率。当应用MLLR(最大似然线性回归)适应时,这种准确率可以提高到63%,而正常语音hmm与低声语音相适应的音节准确率仅为56%。
{"title":"Acoustic analysis and recognition of whispered speech","authors":"Taisuke Itoh, K. Takeda, F. Itakura","doi":"10.1109/ASRU.2001.1034676","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034676","url":null,"abstract":"The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for unvoiced phonemes; 2) the spectral tilt of whispered speech is less sloped than for normal speech; 3) the frequency of the lower formants (below 1.5 kHz) is lower than that of normal speech. Acoustic models (HMM) trained by the whispered speech database attain an accuracy of 60% in syllable recognition experiments. This accuracy can be improved to 63% when MLLR (maximum likelihood linear regression) adaptation is applied, while the normal speech HMMs adapted with whispered speech attain only 56% syllable accuracy.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116218891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Robust speaker clustering in eigenspace 特征空间中的鲁棒说话人聚类
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034588
R. Faltlhauser, G. Ruske
We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.
我们提出了一种在“特征空间”中工作的说话人聚类方案。使用“特征语音”将说话人模型转换为低维子空间。对于说话人聚类过程,可以使用简单的距离度量,例如欧几里得距离。此外,聚类可以用基本模型(用于特征语音投影)如高斯混合模型和传统hmm来完成。在hmm的情况下,重新投影到原始空间很容易产生声学模型。子空间中的聚类产生一个平衡良好的聚类,并且易于控制。在说话人自适应领域,可以区分出几种主要的技术。其中最突出的是贝叶斯自适应(例如MAP),基于变换的方法(MLLR -最大似然线性回归),以及所谓的特征空间技术。尤其是后者已经变得越来越流行,因为它们利用了关于说话人模型分布的先验信息。基本方法通常被称为特征语音(EV)方法。除了这些技术之外,说话人聚类是另一种有吸引力的自适应方案,特别是因为它可以很容易地与上述方法相结合。
{"title":"Robust speaker clustering in eigenspace","authors":"R. Faltlhauser, G. Ruske","doi":"10.1109/ASRU.2001.1034588","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034588","url":null,"abstract":"We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In case of HMMs, re-projection to the original space readily yields acoustic models. Clustering in subspace produces a well-balanced cluster and is easy to control. In the field of speaker adaptation, several principal techniques can be distinguished. The most prominent among them are Bayesian adaptation (e.g. MAP), transformation based approaches (MLLR - maximum likelihood linear regression), as well as so-called eigenspace techniques. Especially the latter have become increasingly popular, as they make use of a-priori information about the distribution of speaker models. The basic approach is commonly called the eigenvoice (EV) approach. Besides these techniques, speaker clustering is a further attractive adaptation scheme, especially since it can be - and has been - easily combined with the above methods.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131693597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Evaluating dialogue strategies and user behavior 评估对话策略和用户行为
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034630
M. Danieli
Summary form only given. The need for accurate and flexible evaluation frameworks for spoken and multimodal dialogue systems has become crucial. In the early design phases of spoken dialogue systems, it is worthwhile evaluating the user's easiness in interacting with different dialogue strategies, rather than the efficiency of the dialogue system in providing the required information. The success of a task-oriented dialogue system greatly depends on the ability of providing a meaningful match between user's expectations and system capabilities, and a good trade-off improves the user's effectiveness. The evaluation methodology requires three steps. The first step has the goal of individuating the different tokens and relations that constitute the user mental model of the task. Once tokens and relations are considered for designing one or more dialogue strategies, the evaluation enters its second step which is constituted by a between-group experiment. Each strategy is tried by a representative set of experimental subjects. The third step includes measuring user effectiveness in providing the spoken dialogue system with the information it needs to solve the task. The paper argues that the application of the three-steps evaluation method may increase our understanding of the user mental model of a task during early stages of development of a spoken language agent. Experimental data supporting this claim are reported.
只提供摘要形式。为口头和多模式对话系统建立准确和灵活的评价框架已变得至关重要。在口语对话系统的早期设计阶段,值得评估的是用户与不同对话策略交互的容易程度,而不是对话系统在提供所需信息方面的效率。面向任务的对话系统的成功在很大程度上取决于在用户期望和系统功能之间提供有意义匹配的能力,良好的权衡可以提高用户的效率。评价方法需要三个步骤。第一步的目标是使构成任务的用户心理模型的不同标记和关系个性化。一旦考虑到符号和关系来设计一个或多个对话策略,评估就进入了第二步,即由组间实验组成的评估。每种策略都由一组有代表性的实验对象来尝试。第三步包括衡量用户在为口语对话系统提供解决任务所需信息方面的有效性。本文认为,应用三步评估方法可以增加我们对语音智能体开发早期阶段任务的用户心理模型的理解。报告了支持这一说法的实验数据。
{"title":"Evaluating dialogue strategies and user behavior","authors":"M. Danieli","doi":"10.1109/ASRU.2001.1034630","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034630","url":null,"abstract":"Summary form only given. The need for accurate and flexible evaluation frameworks for spoken and multimodal dialogue systems has become crucial. In the early design phases of spoken dialogue systems, it is worthwhile evaluating the user's easiness in interacting with different dialogue strategies, rather than the efficiency of the dialogue system in providing the required information. The success of a task-oriented dialogue system greatly depends on the ability of providing a meaningful match between user's expectations and system capabilities, and a good trade-off improves the user's effectiveness. The evaluation methodology requires three steps. The first step has the goal of individuating the different tokens and relations that constitute the user mental model of the task. Once tokens and relations are considered for designing one or more dialogue strategies, the evaluation enters its second step which is constituted by a between-group experiment. Each strategy is tried by a representative set of experimental subjects. The third step includes measuring user effectiveness in providing the spoken dialogue system with the information it needs to solve the task. The paper argues that the application of the three-steps evaluation method may increase our understanding of the user mental model of a task during early stages of development of a spoken language agent. Experimental data supporting this claim are reported.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"474 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131835448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incremental language models for speech recognition using finite-state transducers 使用有限状态换能器的语音识别增量语言模型
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034620
Hans J. G. A. Dolfing, I. L. Hetherington
In the context of the weighted finite-state transducer approach to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sources, modeled as transducers, are too large to be composed and optimized. While the recognition decoder perceives a single, weighted finite-state transducer, we apply a divide-and-conquer technique to split the language model into two parts which add up exactly to the original language model. We investigate the merits of these 'incremental language models' and present some initial results.
在加权有限状态换能器方法用于语音识别的背景下,我们研究了一种新的解码策略来处理大词汇系统中经常使用的非常大的n-gram语言模型。特别地,我们提出了一种替代有限状态传感器网络的完整、静态扩展和优化。当单个知识来源(建模为换能器)太大而无法组合和优化时,这种替代方法非常有用。当识别解码器感知单个加权有限状态换能器时,我们采用分治技术将语言模型分成两部分,这两部分与原始语言模型完全一致。我们研究了这些“增量语言模型”的优点,并给出了一些初步的结果。
{"title":"Incremental language models for speech recognition using finite-state transducers","authors":"Hans J. G. A. Dolfing, I. L. Hetherington","doi":"10.1109/ASRU.2001.1034620","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034620","url":null,"abstract":"In the context of the weighted finite-state transducer approach to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sources, modeled as transducers, are too large to be composed and optimized. While the recognition decoder perceives a single, weighted finite-state transducer, we apply a divide-and-conquer technique to split the language model into two parts which add up exactly to the original language model. We investigate the merits of these 'incremental language models' and present some initial results.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":" 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132123945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Speech interfaces for mobile communications 用于移动通信的语音接口
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034596
H. Nakano
This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they are not widely used. They must be improved in mobile interfaces. The speech interface will not replace Web browsers, but should support and interwork with other interfaces. We also have to discover contents that suit speech interfaces.
本文阐述了移动通信中的语音接口。移动界面有三个重要的设计原则:不干扰用户的主要任务,在用户能力的限制范围内工作,最小化资源需求。社会接受度也很重要。在日本,已经推出了在移动环境中使用语音接口的试用和常规服务,但并未得到广泛应用。它们必须在移动界面上得到改进。语音接口不会取代Web浏览器,但应该支持并与其他接口交互。我们还必须发现适合语音接口的内容。
{"title":"Speech interfaces for mobile communications","authors":"H. Nakano","doi":"10.1109/ASRU.2001.1034596","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034596","url":null,"abstract":"This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they are not widely used. They must be improved in mobile interfaces. The speech interface will not replace Web browsers, but should support and interwork with other interfaces. We also have to discover contents that suit speech interfaces.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131276395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1