首页 > 最新文献

2008 IEEE Spoken Language Technology Workshop最新文献

英文 中文
Real-time speech recognition captioning of events and meetings 实时语音识别事件和会议的字幕
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777874
Gilles Boulianne, M. Boisvert, Frédéric Osterrath
Real-time speech recognition captioning has not progressed much, beyond television broadcast, to other tasks like meetings in the workplace. A number of obstacles prevent this transition, such as proper means to receive and display captions, or on-site shadow speakers costs. More problematic is the insufficient performance of speech recognition for less formal and one-time events. We describe how we developed a mobile platform for remote captioning during trials in several conferences and meetings. We also show that sentence selection based on relative entropy allows training of adequate language models with small amounts of in-domain data, making real-time captioning of an event possible with only a few hours of preparation.
除了电视广播之外,实时语音识别字幕在工作场所会议等其他任务上并没有取得太大进展。许多障碍阻碍了这种转变,例如接收和显示字幕的适当方式,或现场影子扬声器的成本。更大的问题是语音识别在不太正式和一次性事件中的性能不足。我们描述了在几次会议和会议的试验中,我们如何开发一个远程字幕的移动平台。我们还表明,基于相对熵的句子选择允许用少量的域内数据训练足够的语言模型,使得只需几个小时的准备就可以实时描述事件。
{"title":"Real-time speech recognition captioning of events and meetings","authors":"Gilles Boulianne, M. Boisvert, Frédéric Osterrath","doi":"10.1109/SLT.2008.4777874","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777874","url":null,"abstract":"Real-time speech recognition captioning has not progressed much, beyond television broadcast, to other tasks like meetings in the workplace. A number of obstacles prevent this transition, such as proper means to receive and display captions, or on-site shadow speakers costs. More problematic is the insufficient performance of speech recognition for less formal and one-time events. We describe how we developed a mobile platform for remote captioning during trials in several conferences and meetings. We also show that sentence selection based on relative entropy allows training of adequate language models with small amounts of in-domain data, making real-time captioning of an event possible with only a few hours of preparation.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132109688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quantitative evaluation of dialog corpora acquired through different techniques 不同技术对白语料库的定量评价
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777851
D. Griol, L. Hurtado, E. Segarra, E. Arnal
In this paper, we present the results of the comparison between three corpora acquired by means of different techniques. The first corpus was acquired using the Wizard of Oz technique. A statistical user simulation technique has been developed for the acquisition of the second corpus. In this technique, the next user answer is selected by means of a classification process that takes into account the previous user turns, the last system answer and the objective of the dialog. Finally, a dialog simulation technique has been developed for the acquisition of the third corpus. This technique uses a random selection of the user and system turns, defining stop conditions for automatically deciding if the simulated dialog is successful or not. We use several evaluation measures proposed in previous research to compare between our three acquired corpora, and then discuss the similarities and differences with regard to these measures.
在本文中,我们给出了用不同的方法获得的三个语料库的比较结果。第一个语料库是使用绿野仙踪技术获得的。开发了一种用于第二语料库获取的统计用户模拟技术。在这种技术中,下一个用户答案是通过一个分类过程来选择的,这个分类过程考虑了前一个用户的回合、最后一个系统答案和对话的目标。最后,开发了一种对话模拟技术,用于第三语料库的获取。该技术使用用户和系统转弯的随机选择,定义停止条件,以自动决定模拟对话框是否成功。我们使用之前研究中提出的几种评价指标来比较我们所获得的三种语料库,然后讨论这些指标的异同。
{"title":"Quantitative evaluation of dialog corpora acquired through different techniques","authors":"D. Griol, L. Hurtado, E. Segarra, E. Arnal","doi":"10.1109/SLT.2008.4777851","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777851","url":null,"abstract":"In this paper, we present the results of the comparison between three corpora acquired by means of different techniques. The first corpus was acquired using the Wizard of Oz technique. A statistical user simulation technique has been developed for the acquisition of the second corpus. In this technique, the next user answer is selected by means of a classification process that takes into account the previous user turns, the last system answer and the objective of the dialog. Finally, a dialog simulation technique has been developed for the acquisition of the third corpus. This technique uses a random selection of the user and system turns, defining stop conditions for automatically deciding if the simulated dialog is successful or not. We use several evaluation measures proposed in previous research to compare between our three acquired corpora, and then discuss the similarities and differences with regard to these measures.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134423898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IslEnquirer: Social user model acquisition through network analysis and interactive learning IslEnquirer:通过网络分析和互动学习获取社交用户模型
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777854
F. Putze, H. Holzapfel
We present an approach to introduce social awareness in interactive systems. The IslEnquirer is a system which automatically builds social user models. It initializes the models by social network analysis of available offline data. These models are then verified and extended by interactive learning which is carried out by a robot initiated spoken dialog with the user.
我们提出了一种在交互系统中引入社会意识的方法。IslEnquirer是一个自动建立社交用户模型的系统。它通过对可用的离线数据进行社交网络分析来初始化模型。然后,这些模型通过交互式学习进行验证和扩展,这是由机器人发起的与用户的口头对话。
{"title":"IslEnquirer: Social user model acquisition through network analysis and interactive learning","authors":"F. Putze, H. Holzapfel","doi":"10.1109/SLT.2008.4777854","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777854","url":null,"abstract":"We present an approach to introduce social awareness in interactive systems. The IslEnquirer is a system which automatically builds social user models. It initializes the models by social network analysis of available offline data. These models are then verified and extended by interactive learning which is carried out by a robot initiated spoken dialog with the user.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130043457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic title generation for Chinese spoken documents with a delicate scored Viterbi algorithm 基于精细评分维特比算法的中文口语文档自动标题生成
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777866
Sheng-yi Kong, Chien-Chih Wang, Ko-chien Kuo, Lin-Shan Lee
Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search is guided by a delicate score evaluated from three sets of models: term selection model tells the most suitable terms to be included in the title, term ordering model gives the best ordering of the terms to make the title readable, and title length model tells the reasonable length of the title. The models are trained from a training corpus which is not required to be matched with the testing spoken documents. Both objective evaluation based on F1 measure and subjective human evaluation for relevance and readability indicated the approach is very attractive.
语音文档的自动标题生成被认为是浏览和导航大量多媒体内容的重要关键。本文提出了一种新的中文口语文档标题自动生成框架,该框架采用精细评分Viterbi算法对测试口语文档自动生成的文本摘要进行处理。Viterbi束搜索由三组模型的精细评分指导:术语选择模型告诉标题中包含最合适的术语,术语排序模型给出术语的最佳排序以使标题可读,标题长度模型告诉标题的合理长度。这些模型是从一个训练语料库中训练出来的,这个语料库不需要与测试口语文档相匹配。无论是基于F1测度的客观评价,还是对相关性和可读性的主观评价,都表明该方法非常有吸引力。
{"title":"Automatic title generation for Chinese spoken documents with a delicate scored Viterbi algorithm","authors":"Sheng-yi Kong, Chien-Chih Wang, Ko-chien Kuo, Lin-Shan Lee","doi":"10.1109/SLT.2008.4777866","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777866","url":null,"abstract":"Automatic title generation for spoken documents is believed to be an important key for browsing and navigation over huge quantities of multimedia content. A new framework of automatic title generation for Chinese spoken documents is proposed in this paper using a delicate scored Viterbi algorithm performed over automatically generated text summaries of the testing spoken documents. The Viterbi beam search is guided by a delicate score evaluated from three sets of models: term selection model tells the most suitable terms to be included in the title, term ordering model gives the best ordering of the terms to make the title readable, and title length model tells the reasonable length of the title. The models are trained from a training corpus which is not required to be matched with the testing spoken documents. Both objective evaluation based on F1 measure and subjective human evaluation for relevance and readability indicated the approach is very attractive.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122119711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Open vocabulary spoken document retrieval by subword sequence obtained from speech recognizer 利用语音识别器获得的子词序列进行开放词汇口语文档检索
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777900
Go Kuriki, Y. Itoh, K. Kojima, M. Ishigame, Kazuyo Tanaka, Shi-wook Lee
We present a method for open vocabulary retrieval based on a spoken document retrieval (SDR) system using subword models. The present paper proposes a new approach to open vocabulary SDR system using subword models which do not require subword recognition. Instead, subword sequences are obtained from the phone sequence outputted containing an out of vocabulary (OOV) word, a speech recognizer outputs a word sequence whose phone sequence is considered to be similar to the OOV word. When OOV words are provided in a query, the proposed system is able to retrieve the target section by comparing the phone sequences of the query and the word sequence generated by the speech recognizer.
提出了一种基于子词模型的开放式词汇检索方法。本文提出了一种不需要子词识别的子词模型实现开放词汇SDR系统的新方法。相反,从包含超出词汇表(OOV)单词的电话序列输出中获得子词序列,语音识别器输出一个单词序列,其电话序列被认为与OOV单词相似。当查询中提供OOV单词时,所提出的系统能够通过比较查询的电话序列和语音识别器生成的单词序列来检索目标部分。
{"title":"Open vocabulary spoken document retrieval by subword sequence obtained from speech recognizer","authors":"Go Kuriki, Y. Itoh, K. Kojima, M. Ishigame, Kazuyo Tanaka, Shi-wook Lee","doi":"10.1109/SLT.2008.4777900","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777900","url":null,"abstract":"We present a method for open vocabulary retrieval based on a spoken document retrieval (SDR) system using subword models. The present paper proposes a new approach to open vocabulary SDR system using subword models which do not require subword recognition. Instead, subword sequences are obtained from the phone sequence outputted containing an out of vocabulary (OOV) word, a speech recognizer outputs a word sequence whose phone sequence is considered to be similar to the OOV word. When OOV words are provided in a query, the proposed system is able to retrieve the target section by comparing the phone sequences of the query and the word sequence generated by the speech recognizer.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123461717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Name aware speech-to-speech translation for English/Iraqi 英文/伊拉克语语音翻译
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777887
R. Prasad, C. Moran, F. Choi, R. Meermeier, S. Saleem, C. Kao, D. Stallard, P. Natarajan
In this paper, we describe a novel approach that exploits intra-sentence and dialog-level context for improving translation performance on spoken Iraqi utterances that contain named entities (NEs). Dialog-level context is used to predict whether the Iraqi response is likely to contain names and the intra-sentence context is used to determine words that are named entities. While we do not address the problem of translating out-of-vocabulary (OOV) NEs in spoken utterances, we show that our approach is capable of translating OOV names in text input. To demonstrate efficacy of our approach, we present results on internal test set as well as the 2008 June DARPA TRANSTAC name evaluation set.
在本文中,我们描述了一种利用句子内和对话级上下文来提高包含命名实体(NEs)的伊拉克口语话语翻译性能的新方法。对话级上下文用于预测伊拉克响应是否可能包含名称,句子内上下文用于确定哪些单词是命名实体。虽然我们没有解决在口语话语中翻译词汇外(OOV) NEs的问题,但我们表明我们的方法能够翻译文本输入中的OOV名称。为了证明我们方法的有效性,我们展示了内部测试集以及2008年6月DARPA TRANSTAC名称评估集的结果。
{"title":"Name aware speech-to-speech translation for English/Iraqi","authors":"R. Prasad, C. Moran, F. Choi, R. Meermeier, S. Saleem, C. Kao, D. Stallard, P. Natarajan","doi":"10.1109/SLT.2008.4777887","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777887","url":null,"abstract":"In this paper, we describe a novel approach that exploits intra-sentence and dialog-level context for improving translation performance on spoken Iraqi utterances that contain named entities (NEs). Dialog-level context is used to predict whether the Iraqi response is likely to contain names and the intra-sentence context is used to determine words that are named entities. While we do not address the problem of translating out-of-vocabulary (OOV) NEs in spoken utterances, we show that our approach is capable of translating OOV names in text input. To demonstrate efficacy of our approach, we present results on internal test set as well as the 2008 June DARPA TRANSTAC name evaluation set.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115530123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Effects of self-disclosure and empathy in human-computer dialogue 自我表露与共情在人机对话中的作用
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777852
Ryuichiro Higashinaka, Kohji Dohsaka, Hideki Isozaki
To build trust or cultivate long-term relationships with users, conversational systems need to perform social dialogue. To date, research has primarily focused on the overall effect of social dialogue in human-computer interaction, leading to little work on the effects of individual linguistic phenomena within social dialogue. This paper investigates such individual effects through dialogue experiments. Focusing on self-disclosure and empathic utterances (agreement and disagreement), we empirically calculate their contributions to the dialogue quality. Our analysis shows that (1) empathic utterances by users are strong indicators of increasing closeness and user satisfaction, (2) the system's empathic utterances are effective for inducing empathy from users, and (3) self-disclosure by users increases when users have positive preferences on topics being discussed.
为了与用户建立信任或培养长期关系,会话系统需要执行社交对话。迄今为止,研究主要集中在人机交互中社会对话的整体影响上,导致对社会对话中个体语言现象的影响的研究很少。本文通过对话实验探讨了这种个体效应。关注自我表露和共情话语(同意和不同意),我们经验地计算了它们对对话质量的贡献。我们的分析表明:(1)用户的共情话语是增加亲密度和用户满意度的有力指标;(2)系统的共情话语对诱导用户的共情有效;(3)当用户对讨论的话题有积极的偏好时,用户的自我披露会增加。
{"title":"Effects of self-disclosure and empathy in human-computer dialogue","authors":"Ryuichiro Higashinaka, Kohji Dohsaka, Hideki Isozaki","doi":"10.1109/SLT.2008.4777852","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777852","url":null,"abstract":"To build trust or cultivate long-term relationships with users, conversational systems need to perform social dialogue. To date, research has primarily focused on the overall effect of social dialogue in human-computer interaction, leading to little work on the effects of individual linguistic phenomena within social dialogue. This paper investigates such individual effects through dialogue experiments. Focusing on self-disclosure and empathic utterances (agreement and disagreement), we empirically calculate their contributions to the dialogue quality. Our analysis shows that (1) empathic utterances by users are strong indicators of increasing closeness and user satisfaction, (2) the system's empathic utterances are effective for inducing empathy from users, and (3) self-disclosure by users increases when users have positive preferences on topics being discussed.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126165988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Discriminative learning using linguistic features to rescore n-best speech hypotheses 基于语言特征的判别学习对n个最佳语音假设进行评分
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777849
Maria Georgescul, Manny Rayner, P. Bouillon, Nikos Tsourakis
We describe how we were able to improve the accuracy of a medium-vocabulary spoken dialog system by rescoring the list of n-best recognition hypotheses using a combination of acoustic, syntactic, semantic and discourse information. The non-acoustic features are extracted from different intermediate processing results produced by the natural language processing module, and automatically filtered. We apply discriminative support vector learning designed for re-ranking, using both word error rate and semantic error rate as ranking target value, and evaluating using five-fold cross-validation; to show robustness of our method, confidence intervals for word and semantic error rates are computed via bootstrap sampling. The reduction in semantic error rate, from 19% to 11%, is statistically significant at 0.01 level.
我们描述了我们如何能够通过使用声学、句法、语义和话语信息的组合来重新记录n个最佳识别假设列表,从而提高中等词汇量口语对话系统的准确性。从自然语言处理模块产生的不同中间处理结果中提取非声学特征,并进行自动过滤。我们采用了为重新排序设计的判别性支持向量学习,使用单词错误率和语义错误率作为排序目标值,并使用五倍交叉验证进行评估;为了显示我们的方法的鲁棒性,单词和语义错误率的置信区间是通过自举抽样计算的。语义错误率从19%降低到11%,在0.01水平上具有统计学意义。
{"title":"Discriminative learning using linguistic features to rescore n-best speech hypotheses","authors":"Maria Georgescul, Manny Rayner, P. Bouillon, Nikos Tsourakis","doi":"10.1109/SLT.2008.4777849","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777849","url":null,"abstract":"We describe how we were able to improve the accuracy of a medium-vocabulary spoken dialog system by rescoring the list of n-best recognition hypotheses using a combination of acoustic, syntactic, semantic and discourse information. The non-acoustic features are extracted from different intermediate processing results produced by the natural language processing module, and automatically filtered. We apply discriminative support vector learning designed for re-ranking, using both word error rate and semantic error rate as ranking target value, and evaluating using five-fold cross-validation; to show robustness of our method, confidence intervals for word and semantic error rates are computed via bootstrap sampling. The reduction in semantic error rate, from 19% to 11%, is statistically significant at 0.01 level.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131517301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Joint generative and discriminative models for spoken language understanding 口语理解的联合生成和判别模型
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777840
Marco Dinarelli, Alessandro Moschitti, G. Riccardi
Spoken Language Understanding aims at mapping a natural language spoken sentence into a semantic representation. In the last decade two main approaches have been pursued: generative and discriminative models. The former is more robust to overfitting whereas the latter is more robust to many irrelevant features. Additionally, the way in which these approaches encode prior knowledge is very different and their relative performance changes based on the task. In this paper we describe a training framework where both models are used: a generative model produces a list of ranked hypotheses whereas a discriminative model, depending on string kernels and Support Vector Machines, re-ranks such list. We tested such approach on a new corpus produced in the European LUNA project. The results show a large improvement on the state-of-the-art in concept segmentation and labeling.
口语理解的目的是将自然语言的口语句子映射成语义表示。在过去十年中,主要采用了两种方法:生成模型和判别模型。前者对过拟合的鲁棒性更强,而后者对许多不相关的特征的鲁棒性更强。此外,这些方法编码先验知识的方式也各不相同,它们的相对性能也会根据任务的不同而变化。在本文中,我们描述了一个训练框架,其中使用了这两个模型:生成模型产生一个排序假设列表,而判别模型,依赖于字符串核和支持向量机,重新排序这样的列表。我们在欧洲LUNA项目生产的新语料库上测试了这种方法。结果表明,在最先进的概念分割和标记方面有了很大的改进。
{"title":"Joint generative and discriminative models for spoken language understanding","authors":"Marco Dinarelli, Alessandro Moschitti, G. Riccardi","doi":"10.1109/SLT.2008.4777840","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777840","url":null,"abstract":"Spoken Language Understanding aims at mapping a natural language spoken sentence into a semantic representation. In the last decade two main approaches have been pursued: generative and discriminative models. The former is more robust to overfitting whereas the latter is more robust to many irrelevant features. Additionally, the way in which these approaches encode prior knowledge is very different and their relative performance changes based on the task. In this paper we describe a training framework where both models are used: a generative model produces a list of ranked hypotheses whereas a discriminative model, depending on string kernels and Support Vector Machines, re-ranks such list. We tested such approach on a new corpus produced in the European LUNA project. The results show a large improvement on the state-of-the-art in concept segmentation and labeling.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"451 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133270002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A research bed for unit selection based text to speech synthesis 基于单元选择的文本到语音合成研究平台
Pub Date : 2008-12-01 DOI: 10.1109/SLT.2008.4777882
K. Sarathy, A. Ramakrishnan
The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich pre-recorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation, thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.
本文描述了一个模块化的、基于单元选择的TTS框架,它可以作为一个研究平台,用于开发任何新语言的TTS,以及研究在合成过程中改变任何参数的影响。使用这个框架,TTS已经为泰米尔开发出来了。合成数据库由1027个语音丰富的预录句子组成。这个框架已经在卡纳达语中进行了测试。我们的TTS综合了可理解和可接受的自然语音,并得到了高平均意见分数的支持。该框架进一步优化,以适应嵌入式应用,如手机和pda。我们使用商用GSM电话中使用的标准语音压缩算法压缩合成语音数据库,并评估合成句子的质量。即使使用高度压缩的数据库,合成的输出在感知上也与未压缩的数据库接近。通过实验,我们探索了人类在听泰米尔语电话和孤立音节时感知的模糊性,从而提出利用误解来替代数据库中缺失的电话上下文。通过故意用混淆的电话代替电话合成的句子进行了听力实验。
{"title":"A research bed for unit selection based text to speech synthesis","authors":"K. Sarathy, A. Ramakrishnan","doi":"10.1109/SLT.2008.4777882","DOIUrl":"https://doi.org/10.1109/SLT.2008.4777882","url":null,"abstract":"The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich pre-recorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation, thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132890803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2008 IEEE Spoken Language Technology Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1