首页 > 最新文献

2008 IEEE Spoken Language Technology Workshop最新文献

英文 中文
Morphological random forests for language modeling of inflectional languages 形态学随机森林用于屈折语言的语言建模
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777872
I. Oparin, O. Glembek, L. Burget, J. Černocký
In this paper, we are concerned with using decision trees (DT) and random forests (RF) in language modeling for Czech LVCSR. We show that the RF approach can be successfully implemented for language modeling of an inflectional language. Performance of word-based and morphological DTs and RFs was evaluated on lecture recognition task. We show that while DTs perform worse than conventional trigram language models (LM), RFs of both kind outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction). In this paper we also investigate distribution of morphological feature types chosen for splitting data at different levels of DTs.
在本文中,我们关注的是使用决策树(DT)和随机森林(RF)的语言建模捷克LVCSR。我们证明了射频方法可以成功地用于屈折语言的语言建模。对基于词的、形态的DTs和RFs在演讲识别任务中的表现进行了评价。我们表明,虽然dtd的性能比传统的三元语言模型(LM)差,但两种rdf的性能都优于后者。相对于三元组模型,形态学RFs可以获得WER(相对高达3.4%)和perplexity(10%)的降低。用三元模型插值DT和RF模型后,得到了进一步的改进(perplexity高达15.6%,WER相对降低4.8%)。在本文中,我们还研究了在不同层次的数据分割中选择的形态学特征类型的分布。
{"title":"Morphological random forests for language modeling of inflectional languages","authors":"I. Oparin, O. Glembek, L. Burget, J. Černocký","doi":"10.1109/SLT.2008.4777872","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777872","url":null,"abstract":"In this paper, we are concerned with using decision trees (DT) and random forests (RF) in language modeling for Czech LVCSR. We show that the RF approach can be successfully implemented for language modeling of an inflectional language. Performance of word-based and morphological DTs and RFs was evaluated on lecture recognition task. We show that while DTs perform worse than conventional trigram language models (LM), RFs of both kind outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction). In this paper we also investigate distribution of morphological feature types chosen for splitting data at different levels of DTs.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121807411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech 语音情感识别中频谱和韵律特征的性能分析及其融合
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777903
Manish Gaurav
In this paper, we study the performance of different prosody and spectral features of speech on an emotion detection task. In particular, a feature selection algorithm has been used to assess the relevancy of the different features. Gaussian mixtures models have been used to model the features extracted at the frame-level, while support vector machines (SVM) and k-nearest neighbor (k-NN) methods have been used to model the features extracted at the utterance level. We use a normalization approach (T-norm) to combine the scores from the different models. The results using the above approach are reported for the Berlin emotional database corpus and the task consisted of classifying the six emotions namely - anger, happiness, neutral, sadness, boredom and anxiety. We show that the use of feature selection algorithm improves the result, while in addition the fusion of GMM and SVM results in an overall accuracy of 75.4% for the above task.
在本文中,我们研究了语音的不同韵律和频谱特征在情绪检测任务中的表现。特别地,一个特征选择算法已经被用来评估不同特征的相关性。使用高斯混合模型对帧级提取的特征进行建模,而使用支持向量机(SVM)和k-最近邻(k-NN)方法对话语级提取的特征进行建模。我们使用标准化方法(T-norm)来组合来自不同模型的分数。使用上述方法的结果报告了柏林情绪数据库语料库,任务包括对六种情绪进行分类,即愤怒,快乐,中性,悲伤,无聊和焦虑。我们的研究表明,使用特征选择算法改善了结果,此外,GMM和SVM的融合使得上述任务的总体准确率达到75.4%。
{"title":"Performance analysis of spectral and prosodic features and their fusion for emotion recognition in speech","authors":"Manish Gaurav","doi":"10.1109/SLT.2008.4777903","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777903","url":null,"abstract":"In this paper, we study the performance of different prosody and spectral features of speech on an emotion detection task. In particular, a feature selection algorithm has been used to assess the relevancy of the different features. Gaussian mixtures models have been used to model the features extracted at the frame-level, while support vector machines (SVM) and k-nearest neighbor (k-NN) methods have been used to model the features extracted at the utterance level. We use a normalization approach (T-norm) to combine the scores from the different models. The results using the above approach are reported for the Berlin emotional database corpus and the task consisted of classifying the six emotions namely - anger, happiness, neutral, sadness, boredom and anxiety. We show that the use of feature selection algorithm improves the result, while in addition the fusion of GMM and SVM results in an overall accuracy of 75.4% for the above task.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127151341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Impact of dynamic model adaptation beyond speech recognition 动态模型自适应对语音识别的影响
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777894
Fernando Batista, R. Amaral, I. Trancoso, N. Mamede
The application of speech recognition to live subtitling of Broadcast News has motivated the adaptation of the lexical and language models of the recognizer on a daily basis with text material retrieved from online newspapers. This paper studies the impact of this adaptation on two of the blocks following the speech recognition module: capitalization and topic indexation. We describe and evaluate different adaptation approaches that try to explore the language dynamics.
语音识别在广播新闻直播字幕中的应用,促使识别器对从在线报纸中检索的文本材料进行日常词汇和语言模型的适配。本文研究了这种适应对语音识别模块之后的两个模块:大写和主题索引的影响。我们描述并评估了试图探索语言动态的不同适应方法。
{"title":"Impact of dynamic model adaptation beyond speech recognition","authors":"Fernando Batista, R. Amaral, I. Trancoso, N. Mamede","doi":"10.1109/SLT.2008.4777894","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777894","url":null,"abstract":"The application of speech recognition to live subtitling of Broadcast News has motivated the adaptation of the lexical and language models of the recognizer on a daily basis with text material retrieved from online newspapers. This paper studies the impact of this adaptation on two of the blocks following the speech recognition module: capitalization and topic indexation. We describe and evaluate different adaptation approaches that try to explore the language dynamics.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132754505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Recent improvements in BBN's English/Iraqi speech-to-speech translation system BBN英语/伊拉克语语音翻译系统的最新改进
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777886
F. Choi, S. Tsakalidis, S. Saleem, C. Kao, R. Meermeier, K. Krstovski, C. Moran, Krishna Subramanian, R. Prasad, P. Natarajan
We report on recent improvements in our English/Iraqi Arabic speech-to-speech translation system. User interface improvements include a novel parallel approach to user confirmation which makes confirmation cost-free in terms of dialog duration. Automatic speech recognition improvements include the incorporation of state-of-the-art techniques in feature transformation and discriminative training. Machine translation improvements include a novel combination of multiple alignments derived from various pre-processing techniques, such as Arabic segmentation and English word compounding, higher order N-grams for target language model, and use of context in form of semantic classes and part-of-speech tags.
我们报告了我们的英语/伊拉克阿拉伯语语音到语音翻译系统的最新改进。用户界面的改进包括一种新的用户确认并行方法,这使得确认在对话持续时间方面没有成本。自动语音识别的改进包括在特征转换和判别训练中结合最先进的技术。机器翻译的改进包括来自各种预处理技术的多种对齐的新组合,例如阿拉伯语分割和英语单词组合,目标语言模型的高阶N-grams,以及以语义类和词性标签的形式使用上下文。
{"title":"Recent improvements in BBN's English/Iraqi speech-to-speech translation system","authors":"F. Choi, S. Tsakalidis, S. Saleem, C. Kao, R. Meermeier, K. Krstovski, C. Moran, Krishna Subramanian, R. Prasad, P. Natarajan","doi":"10.1109/SLT.2008.4777886","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777886","url":null,"abstract":"We report on recent improvements in our English/Iraqi Arabic speech-to-speech translation system. User interface improvements include a novel parallel approach to user confirmation which makes confirmation cost-free in terms of dialog duration. Automatic speech recognition improvements include the incorporation of state-of-the-art techniques in feature transformation and discriminative training. Machine translation improvements include a novel combination of multiple alignments derived from various pre-processing techniques, such as Arabic segmentation and English word compounding, higher order N-grams for target language model, and use of context in form of semantic classes and part-of-speech tags.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133218748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Correcting asr outputs: Specific solutions to specific errors in French 纠正asr输出:针对法语中特定错误的特定解决方案
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777878
Richard Dufour, Y. Estève
Automatic speech recognition (ASR) systems are used in a large number of applications, in spite of the inevitable recognition errors. In this study we propose a pragmatic approach to automatically repair ASR outputs by taking into account linguistic and acoustic information, using formal rules or stochastic methods. The proposed strategy consists in developing a specific correction solution for each specific kind of errors. In this paper, we apply this strategy on two case studies specific to French language. We show that it is possible, on automatic transcriptions of French broadcast news, to decrease the error rate of a specific error by 11.4% in one of two the case studies, and 86.4% in the other one. These results are encouraging and show the interest of developing more specific solutions to cover a wider set of errors in a future work.
自动语音识别(ASR)系统被广泛应用,但不可避免地存在识别误差。在这项研究中,我们提出了一种实用的方法,通过考虑语言和声学信息,使用形式规则或随机方法来自动修复ASR输出。建议的策略包括为每种特定类型的错误开发特定的纠正解决方案。在本文中,我们将这一策略应用于两个具体的法语案例研究。我们表明,在法语广播新闻的自动转录上,有可能将两个案例研究中的一个特定错误的错误率降低11.4%,另一个案例研究中的错误率降低86.4%。这些结果令人鼓舞,并显示出在未来的工作中开发更具体的解决方案以覆盖更广泛的错误集的兴趣。
{"title":"Correcting asr outputs: Specific solutions to specific errors in French","authors":"Richard Dufour, Y. Estève","doi":"10.1109/SLT.2008.4777878","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777878","url":null,"abstract":"Automatic speech recognition (ASR) systems are used in a large number of applications, in spite of the inevitable recognition errors. In this study we propose a pragmatic approach to automatically repair ASR outputs by taking into account linguistic and acoustic information, using formal rules or stochastic methods. The proposed strategy consists in developing a specific correction solution for each specific kind of errors. In this paper, we apply this strategy on two case studies specific to French language. We show that it is possible, on automatic transcriptions of French broadcast news, to decrease the error rate of a specific error by 11.4% in one of two the case studies, and 86.4% in the other one. These results are encouraging and show the interest of developing more specific solutions to cover a wider set of errors in a future work.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion 基于监督方法和双字扩展的会议语料库关键字自动提取
Pub Date : 2008-12-01 DOI: 10.1109/slt.2008.4777870
Fei Liu, Feifan Liu, Yang Liu
{"title":"Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion","authors":"Fei Liu, Feifan Liu, Yang Liu","doi":"10.1109/slt.2008.4777870","DOIUrl":"https://doi.org/10.1109/slt.2008.4777870","url":null,"abstract":"","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"28 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116859026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Continuous topic language modeling for speech recognition 语音识别的连续主题语言建模
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777873
C. Chueh, Jen-Tzung Chien
Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.
单词序列的连续表示可以有效地解决n-gram语言模型中的数据稀疏问题,其中单词的离散变量被表示,不可见的事件容易发生。在对高阶n-gram模型进行长距离规律提取时,这一问题日益严重。我们不再考虑离散词空间,而是构建词序列的连续空间,在连续空间中提取潜在的主题信息。由主题后验概率形成连续向量,并估计离散词空间到连续主题空间的最小二乘投影矩阵。通过新的连续潜在主题语言模型,可以对未见词进行预测。在连续语音识别实验中,与传统的基于主题的语言模型相比,我们获得了显著的性能提升。
{"title":"Continuous topic language modeling for speech recognition","authors":"C. Chueh, Jen-Tzung Chien","doi":"10.1109/SLT.2008.4777873","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777873","url":null,"abstract":"Continuous representation of word sequence can effectively solve data sparseness problem in n-gram language model, where the discrete variables of words are represented and the unseen events are prone to happen. This problem is increasingly severe when extracting long-distance regularities for high-order n-gram model. Rather than considering discrete word space, we construct the continuous space of word sequence where the latent topic information is extracted. The continuous vector is formed by the topic posterior probabilities and the least-squares projection matrix from discrete word space to continuous topic space is estimated accordingly. The unseen words can be predicted through the new continuous latent topic language model. In the experiments on continuous speech recognition, we obtain significant performance improvement over the conventional topic-based language model.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116273104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
“Who is this” quiz dialogue system and users' evaluation “这是谁”问答对话系统和用户评价
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777862
M. Sawaki, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka, Eisaku Maeda
In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes ldquoWho is thisrdquo quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner, we implemented the system as a stuffed-toy (or CG equivalent). Quizzes are automatically generated from Wikipedia articles, rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level.
为了设计一个用户喜欢并想要长期接近的对话系统,了解系统的动作对用户的影响是很重要的。本文介绍了ldquoWho这个问答对话系统及其用户评价。它的测验式信息呈现被发现对教育任务是有效的。在我们不断努力使其更接近对话伙伴的过程中,我们将系统实现为填充玩具(或CG等效物)。测验是根据维基百科文章自动生成的,而不是手工制作的传记事实。利用网络挖掘来准备自适应系统响应。实验证明了人际网络的有效性以及用户属性与兴趣水平的关系。
{"title":"“Who is this” quiz dialogue system and users' evaluation","authors":"M. Sawaki, Yasuhiro Minami, Ryuichiro Higashinaka, Kohji Dohsaka, Eisaku Maeda","doi":"10.1109/SLT.2008.4777862","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777862","url":null,"abstract":"In order to design a dialogue system that users enjoy and want to be near for a long time, it is important to know the effect of the system's action on users. This paper describes ldquoWho is thisrdquo quiz dialogue system and its users' evaluation. Its quiz-style information presentation has been found effective for educational tasks. In our ongoing effort to make it closer to a conversational partner, we implemented the system as a stuffed-toy (or CG equivalent). Quizzes are automatically generated from Wikipedia articles, rather than from hand-crafted sets of biographical facts. Network mining is utilized to prepare adaptive system responses. Experiments showed the effectiveness of person network and the relationship of user attribute and interest level.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114451000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Caller Experience: A method for evaluating dialog systems and its automatic prediction 呼叫体验:一种评估对话系统及其自动预测的方法
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777857
Keelan Evanini, P. Hunter, J. Liscombe, David Suendermann-Oeft, K. Dayanidhi, R. Pieraccini
In this paper we introduce a subjective metric for evaluating the performance of spoken dialog systems, caller experience (CE). CE is a useful metric for tracking the overall performance of a system in deployment, as well as for isolating individual problematic calls in which the system underperforms. The proposed CE metric differs from most performance evaluation metrics proposed in the past in that it is a) a subjective, qualitative rating of the call, and b) provided by expert, external listeners, not the callers themselves. The results of an experiment in which a set of human experts listened to the same calls three times are presented. The fact that these results show a high level of agreement among different listeners, despite the subjective nature of the task, demonstrates the validity of using CE as a standard metric. Finally, an automated rating system using objective measures is shown to perform at the same high level as the humans. This is an important advance, since it provides a way to reduce the human labor costs associated with producing a reliable CE.
本文介绍了一种评价语音对话系统性能的主观指标——呼叫体验(CE)。CE是一种有用的度量,可用于跟踪部署中系统的总体性能,以及隔离系统性能不佳的个别有问题的调用。所提出的CE度量不同于过去提出的大多数性能评估度量,因为它是a)对呼叫进行主观的定性评价,以及b)由外部的专家听众提供,而不是呼叫者自己。在一项实验中,一组人类专家把同样的电话听了三遍,结果被呈现出来。这些结果表明,尽管任务具有主观性,但不同听者之间的一致性很高,这一事实证明了使用语言表达作为标准度量的有效性。最后,使用客观测量的自动评级系统显示出与人类相同的高水平。这是一个重要的进步,因为它提供了一种减少与生产可靠CE相关的人力成本的方法。
{"title":"Caller Experience: A method for evaluating dialog systems and its automatic prediction","authors":"Keelan Evanini, P. Hunter, J. Liscombe, David Suendermann-Oeft, K. Dayanidhi, R. Pieraccini","doi":"10.1109/SLT.2008.4777857","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777857","url":null,"abstract":"In this paper we introduce a subjective metric for evaluating the performance of spoken dialog systems, caller experience (CE). CE is a useful metric for tracking the overall performance of a system in deployment, as well as for isolating individual problematic calls in which the system underperforms. The proposed CE metric differs from most performance evaluation metrics proposed in the past in that it is a) a subjective, qualitative rating of the call, and b) provided by expert, external listeners, not the callers themselves. The results of an experiment in which a set of human experts listened to the same calls three times are presented. The fact that these results show a high level of agreement among different listeners, despite the subjective nature of the task, demonstrates the validity of using CE as a standard metric. Finally, an automated rating system using objective measures is shown to perform at the same high level as the humans. This is an important advance, since it provides a way to reduce the human labor costs associated with producing a reliable CE.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127400180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Starting to cook a tutoring dialogue system 开始做家教对话系统
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777861
Filipe M. Martins, Joana Paulo Pardal, Luís Franqueira, Pedro Arez, N. Mamede
This paper presents a system that helps you cook a recipe through a spoken dialogue tutoring session. We report our experience while creating the first version of a tutoring dialogue system that helps the user cook a selected dish. Having a working framework to support us with the creation of the cooking assistant, the main challenge we faced was the change of paradigm: instead of the system being driven by the user, the user is instructed by the system. The result is a system capable of dictating generic contents to the user. On top of it, the system can be used in several domains where the goal is not the replacement of the user but providing some assistance while (s)he performs some procedural task.
本文介绍了一个通过口语对话辅导课程来帮助你烹饪食谱的系统。我们在创建辅导对话系统的第一个版本时报告了我们的经验,该系统可以帮助用户烹饪选定的菜肴。有了一个工作框架来支持我们创建烹饪助手,我们面临的主要挑战是范式的变化:不是系统由用户驱动,而是用户由系统指导。其结果是一个能够向用户口述通用内容的系统。最重要的是,该系统可用于多个领域,这些领域的目标不是取代用户,而是在用户执行某些程序性任务时提供一些帮助。
{"title":"Starting to cook a tutoring dialogue system","authors":"Filipe M. Martins, Joana Paulo Pardal, Luís Franqueira, Pedro Arez, N. Mamede","doi":"10.1109/SLT.2008.4777861","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777861","url":null,"abstract":"This paper presents a system that helps you cook a recipe through a spoken dialogue tutoring session. We report our experience while creating the first version of a tutoring dialogue system that helps the user cook a selected dish. Having a working framework to support us with the creation of the cooking assistant, the main challenge we faced was the change of paradigm: instead of the system being driven by the user, the user is instructed by the system. The result is a system capable of dictating generic contents to the user. On top of it, the system can be used in several domains where the goal is not the replacement of the user but providing some assistance while (s)he performs some procedural task.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124710626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2008 IEEE Spoken Language Technology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1