首页 > 最新文献

2012 IEEE Spoken Language Technology Workshop (SLT)最新文献

英文 中文
Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning 结合外语学习背景下非母语语音错误词条检测标准
Pub Date : 2012-12-02 DOI: 10.1109/SLT.2012.6424261
Luiza Orosanu, D. Jouvet, D. Fohr, I. Illina, A. Bonneau
This article analyzes the detection of incorrect entries of non-native speech in the context of foreign language learning. The purpose is to detect and reject incorrect entries (i.e. those for which the speech signal does not correspond at all to the associated text) while being tolerant to the mispronunciations of non-native speech. The proposed approach exploits the comparison between two text-to-speech alignments : one constrained by the text which is being checked, with another one unconstrained, corresponding to a phonetic decoding. Several comparison criteria are described and combined via a logistic regression function. The article analyzes the influence of different settings, such as the impact of non-native pronunciation variants, the impact of learning the decision functions on native or on non-native speech, as well as the impact of combining various comparison criteria. The performance evaluations are conducted both on native and on non-native speech.
本文分析了外语学习背景下的非母语语音错误词条的检测。其目的是检测和拒绝不正确的条目(即语音信号与相关文本完全不对应的条目),同时容忍非母语语音的错误发音。所提出的方法利用两种文本到语音对齐之间的比较:一种受正在检查的文本约束,另一种不受约束,对应于语音解码。通过逻辑回归函数描述和组合了几个比较标准。本文分析了不同设置的影响,如非母语语音变体的影响,学习决策函数对母语或非母语语音的影响,以及结合各种比较标准的影响。对母语和非母语语音进行了绩效评估。
{"title":"Combining criteria for the detection of incorrect entries of non-native speech in the context of foreign language learning","authors":"Luiza Orosanu, D. Jouvet, D. Fohr, I. Illina, A. Bonneau","doi":"10.1109/SLT.2012.6424261","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424261","url":null,"abstract":"This article analyzes the detection of incorrect entries of non-native speech in the context of foreign language learning. The purpose is to detect and reject incorrect entries (i.e. those for which the speech signal does not correspond at all to the associated text) while being tolerant to the mispronunciations of non-native speech. The proposed approach exploits the comparison between two text-to-speech alignments : one constrained by the text which is being checked, with another one unconstrained, corresponding to a phonetic decoding. Several comparison criteria are described and combined via a logistic regression function. The article analyzes the influence of different settings, such as the impact of non-native pronunciation variants, the impact of learning the decision functions on native or on non-native speech, as well as the impact of combining various comparison criteria. The performance evaluations are conducted both on native and on non-native speech.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121431058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Crowdsourcing the acquisition of natural language corpora: Methods and observations 自然语言语料库的众包获取:方法与观察
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424200
William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz
We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.
我们研究了使用众包方法获取用于自然语言处理系统的语言语料库的机会。具体来说,我们实证研究了三种方法来引出对应于给定语义形式的自然语言句子。这些方法通过句子、场景和基于列表的描述向人群工作者传递框架语义。我们讨论了众包过程的各种性能度量,并分析了收集语言的语义正确性、自然性和偏差。我们强调了应用这些方法获取用于自然语言处理应用的语料库的研究挑战和方向。
{"title":"Crowdsourcing the acquisition of natural language corpora: Methods and observations","authors":"William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz","doi":"10.1109/SLT.2012.6424200","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424200","url":null,"abstract":"We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121251118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Exploiting the Semantic Web for unsupervised spoken language understanding 利用语义网进行无监督的口语理解
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424227
Larry Heck, Dilek Z. Hakkani-Tür
This paper proposes an unsupervised training approach for SLU systems that leverages the structured semantic knowledge graphs of the emerging Semantic Web. The approach creates natural language surface forms of entity-relation-entity portions of knowledge graphs using a combination of web search retrieval and syntax-based dependency parsing. The new forms are used to train an SLU system in an unsupervised manner. This paper tests the approach on the problem of intent detection, and shows that the unsupervised training procedure matches the performance of supervised training over operating points important for commercial applications.
本文提出了一种利用新兴语义网的结构化语义知识图的SLU系统无监督训练方法。该方法使用网络搜索检索和基于语法的依赖解析相结合的方法,创建知识图的实体-关系-实体部分的自然语言表面形式。新的表格用于以无监督的方式训练SLU系统。本文在意图检测问题上对该方法进行了测试,结果表明,在重要的商业应用操作点上,无监督训练过程的性能与监督训练相匹配。
{"title":"Exploiting the Semantic Web for unsupervised spoken language understanding","authors":"Larry Heck, Dilek Z. Hakkani-Tür","doi":"10.1109/SLT.2012.6424227","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424227","url":null,"abstract":"This paper proposes an unsupervised training approach for SLU systems that leverages the structured semantic knowledge graphs of the emerging Semantic Web. The approach creates natural language surface forms of entity-relation-entity portions of knowledge graphs using a combination of web search retrieval and syntax-based dependency parsing. The new forms are used to train an SLU system in an unsupervised manner. This paper tests the approach on the problem of intent detection, and shows that the unsupervised training procedure matches the performance of supervised training over operating points important for commercial applications.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125929339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
An automatic pitch accent feedback system for english learners with adaptation of an english corpus spoken by Koreans 一个自动音高口音反馈系统,为英语学习者与韩国人说的英语语料库的适应
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424263
Sechun Kang, G. G. Lee, Ho-Young Lee, Byeongchang Kim
To improve the English proficiency of Korean learners, we design a system for pitch accents, which consists of prediction, detection and feedback parts. The prediction and detection parts adopt Conditional Random Field models to achieve a prediction accuracy of 87.25%, which is based on the Boston University radio news corpus, and a detection accuracy of 81.21%, which is based on the Korean Learner's English Accentuation corpus. In the learner experiment with our system, learners' pitch accent proficiency, as assessed by English experts, was improved from 2.67 to 3.25 on a scale of 1-to-5, and the accuracy of not-wrong feedback was measured at 82.77%. The learners assessed the learning effectiveness of our system at 4.3 on a scale of 1-to-5.
为了提高韩语学习者的英语水平,我们设计了一个由预测、检测和反馈三部分组成的音高口音系统。预测和检测部分采用条件随机场模型,基于波士顿大学广播新闻语料库的预测准确率为87.25%,基于韩国学习者英语重音语料库的检测准确率为81.21%。在使用该系统的学习者实验中,经英语专家评估,学习者的音高熟练度从2.67分提高到3.25分(满分为1- 5分),非错误反馈的准确率达到82.77%。在1到5的范围内,学员们给我们系统的学习效果打了4.3分。
{"title":"An automatic pitch accent feedback system for english learners with adaptation of an english corpus spoken by Koreans","authors":"Sechun Kang, G. G. Lee, Ho-Young Lee, Byeongchang Kim","doi":"10.1109/SLT.2012.6424263","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424263","url":null,"abstract":"To improve the English proficiency of Korean learners, we design a system for pitch accents, which consists of prediction, detection and feedback parts. The prediction and detection parts adopt Conditional Random Field models to achieve a prediction accuracy of 87.25%, which is based on the Boston University radio news corpus, and a detection accuracy of 81.21%, which is based on the Korean Learner's English Accentuation corpus. In the learner experiment with our system, learners' pitch accent proficiency, as assessed by English experts, was improved from 2.67 to 3.25 on a scale of 1-to-5, and the accuracy of not-wrong feedback was measured at 82.77%. The learners assessed the learning effectiveness of our system at 4.3 on a scale of 1-to-5.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"19 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123280914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recognition rate estimation based on word alignment network and discriminative error type classification 基于词对齐网络和判别错误类型分类的识别率估计
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424207
A. Ogawa, Takaaki Hori, Atsushi Nakamura
Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.
如果我们要判断语音识别技术是否适用于新任务,那么在不使用参考转录的情况下估计识别率的技术是必不可少的。本文提出了两种用于连续语音识别的识别率估计方法。第一种方法是通过简单的转换程序从一个词混淆网络中获得一个基于词对齐网络(WAN)的易于使用的方法。广域网一个字一个字地包含一个识别结果的正确概率(C)、替换错误(S)、插入错误(I)和删除错误(D)。通过分别将这些CSID概率相加,可以在不使用参考转录的情况下估计正确率和单词准确性(WACC)。第二种更高级的方法是基于区别错误类型分类(ETC)对广域网提供的CSID概率进行细化,并更准确地估计识别率。在MIT演讲语料库的实验中,我们获得了使用参考转录评分工具计算的真实WACCs与判别ETC结果估计的WACCs之间的相关系数为0.97。
{"title":"Recognition rate estimation based on word alignment network and discriminative error type classification","authors":"A. Ogawa, Takaaki Hori, Atsushi Nakamura","doi":"10.1109/SLT.2012.6424207","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424207","url":null,"abstract":"Techniques for estimating recognition rates without using reference transcriptions are essential if we are to judge whether or not speech recognition technology is applicable to a new task. This paper proposes two recognition rate estimation methods for continuous speech recognition. The first is an easy-to-use method based on a word alignment network (WAN) obtained from a word confusion network through simple conversion procedures. A WAN contains the correct (C), substitution error (S), insertion error (I) and deletion error (D) probabilities word-by-word for a recognition result. By summing these CSID probabilities individually, the percent correct and word accuracy (WACC) can be estimated without using a reference transcription. The second more advanced method refines the CSID probabilities provided by a WAN based on discriminative error type classification (ETC) and estimates the recognition rates more accurately. In the experiments on the MIT lecture speech corpus, we obtained 0.97 of correlation coefficient between the true WACCs calculated by a scoring tool using reference transcriptions and the WACCs estimated from the discriminative ETC results.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122714204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Generating grammar questions using corpus data in L2 learning 在二语学习中使用语料库数据生成语法问题
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424265
Kyusong Lee, Soo-Ok Kweon, Hongsuck Seo, G. G. Lee
This paper examines how grammar questions are automatically generated for L2 learning by applying a sequential labeling technique to learner corpora. We developed a model that helps detect possible error positions and select the most appropriate form among choices. Discriminant models such as conditional random field and maximum entropy are used to generate the error identification question. Questions generated by the proposed method corresponded highly to questions that experts made. Our data-driven approach lends itself to any language without costing expensive expertise.
本文研究了如何通过对学习者语料库应用顺序标记技术来自动生成第二语言学习中的语法问题。我们开发了一个模型,帮助检测可能的错误位置,并在选择中选择最合适的形式。使用条件随机场和最大熵等判别模型来生成错误识别问题。该方法生成的问题与专家提出的问题高度吻合。我们的数据驱动方法适用于任何语言,而无需花费昂贵的专业知识。
{"title":"Generating grammar questions using corpus data in L2 learning","authors":"Kyusong Lee, Soo-Ok Kweon, Hongsuck Seo, G. G. Lee","doi":"10.1109/SLT.2012.6424265","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424265","url":null,"abstract":"This paper examines how grammar questions are automatically generated for L2 learning by applying a sequential labeling technique to learner corpora. We developed a model that helps detect possible error positions and select the most appropriate form among choices. Discriminant models such as conditional random field and maximum entropy are used to generate the error identification question. Questions generated by the proposed method corresponded highly to questions that experts made. Our data-driven approach lends itself to any language without costing expensive expertise.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122923587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Modeling intensity contours and the interaction of pitch and intensity to improve automatic prosodic event detection and classification 建模强度轮廓和音调和强度的相互作用,以提高自动韵律事件检测和分类
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424253
A. Rosenberg
Prosody, or the way words are spoken, carries important information to understanding a speaker's communicative intention. Many studies on automatic prosodic analysis focus on parameterizing pitch content. In this work, we extend previous pitch contour modeling features to intensity contours, and develop a set of features based on the interaction of pitch and intensity. These new features improve the state-of-the-art on all prosodic event detection and classification tasks related to automatic ToBI labeling.
韵律,或者说说话的方式,承载着理解说话人的交际意图的重要信息。许多韵律自动分析的研究都集中在参数化音高内容上。在这项工作中,我们将先前的基音轮廓建模特征扩展到强度轮廓,并开发了一套基于基音和强度相互作用的特征。这些新功能提高了与自动ToBI标记相关的所有韵律事件检测和分类任务的最先进水平。
{"title":"Modeling intensity contours and the interaction of pitch and intensity to improve automatic prosodic event detection and classification","authors":"A. Rosenberg","doi":"10.1109/SLT.2012.6424253","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424253","url":null,"abstract":"Prosody, or the way words are spoken, carries important information to understanding a speaker's communicative intention. Many studies on automatic prosodic analysis focus on parameterizing pitch content. In this work, we extend previous pitch contour modeling features to intensity contours, and develop a set of features based on the interaction of pitch and intensity. These new features improve the state-of-the-art on all prosodic event detection and classification tasks related to automatic ToBI labeling.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122078746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech 基于约束短语树的多词短语建模,改进会话语音的主题建模
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424226
Timothy J. Hazen, Fred Richardson
Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bag-of-words representations that ignore word sequence information that can aid in semantic understanding. In this work we introduce a method for efficiently incorporating arbitrarily long word sequences into a topic modeling approach. This method iteratively constructs a constrained set of phrase trees in an unsupervised fashion from a document collection using weighted pointwise mutual information statistics to guide the process. In experiments on the Fisher Corpus of conversational speech, the incorporation of learned phrases into a latent topic model yielded significant improvements in the unsupervised discovery of the known topics present within the data.
潜在主题建模已被证明是学习文档集合中潜在语义内容的有效方法。传统上,潜在主题建模被应用于忽略有助于语义理解的单词序列信息的词袋表示。在这项工作中,我们介绍了一种有效地将任意长词序列合并到主题建模方法中的方法。该方法使用加权的点向互信息统计来指导过程,以无监督的方式从文档集合中迭代构建约束的短语树集。在Fisher会话语音语料库的实验中,将学习到的短语合并到潜在主题模型中,在数据中存在的已知主题的无监督发现方面取得了显着改善。
{"title":"Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech","authors":"Timothy J. Hazen, Fred Richardson","doi":"10.1109/SLT.2012.6424226","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424226","url":null,"abstract":"Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bag-of-words representations that ignore word sequence information that can aid in semantic understanding. In this work we introduce a method for efficiently incorporating arbitrarily long word sequences into a topic modeling approach. This method iteratively constructs a constrained set of phrase trees in an unsupervised fashion from a document collection using weighted pointwise mutual information statistics to guide the process. In experiments on the Fisher Corpus of conversational speech, the incorporation of learned phrases into a latent topic model yielded significant improvements in the unsupervised discovery of the known topics present within the data.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116512471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Simultaneous feature selection and parameter optimization for training of dialog policy by reinforcement learning 基于强化学习的对话策略训练的特征选择与参数优化
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424160
Teruhisa Misu, H. Kashioka
This paper addresses the problem of feature selection in the reinforcement learning (RL) of the dialog policies of spoken dialog systems. A statistical dialog manager selects the system actions the system should take based on the features derived from the current dialog state and/or the system's belief state. When defining the features used by the system for training the dialog policy, however, finding a set of actually effective features from potentially useful ones is not obvious. In addition, the selection should be done simultaneously with the optimization of the dialog policy. In this paper, we propose an incremental feature selection method for the optimization of a dialog policy by RL, in which improvement of the dialog policy and the feature selection are conducted simultaneously. Experiments in dialog policy optimization by RL with a user simulator demonstrated the following: 1) that the proposed method can find a better dialog policy with fewer policy iterations and 2) the learning speed is comparable with the case where feature selection is conducted in advance.
本文研究了口语对话系统对话策略强化学习中的特征选择问题。统计对话管理器根据来自当前对话状态和/或系统的信念状态的特征选择系统应该采取的系统操作。然而,在定义系统用于训练对话策略的特征时,从潜在有用的特征中找到一组实际有效的特征并不明显。此外,选择应与对话策略的优化同时进行。在本文中,我们提出了一种增量特征选择方法用于RL优化对话策略,其中对话策略的改进和特征选择同时进行。基于用户模拟器的强化学习对话策略优化实验表明:1)该方法能够以较少的策略迭代次数找到更好的对话策略;2)学习速度与事先进行特征选择的情况相当。
{"title":"Simultaneous feature selection and parameter optimization for training of dialog policy by reinforcement learning","authors":"Teruhisa Misu, H. Kashioka","doi":"10.1109/SLT.2012.6424160","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424160","url":null,"abstract":"This paper addresses the problem of feature selection in the reinforcement learning (RL) of the dialog policies of spoken dialog systems. A statistical dialog manager selects the system actions the system should take based on the features derived from the current dialog state and/or the system's belief state. When defining the features used by the system for training the dialog policy, however, finding a set of actually effective features from potentially useful ones is not obvious. In addition, the selection should be done simultaneously with the optimization of the dialog policy. In this paper, we propose an incremental feature selection method for the optimization of a dialog policy by RL, in which improvement of the dialog policy and the feature selection are conducted simultaneously. Experiments in dialog policy optimization by RL with a user simulator demonstrated the following: 1) that the proposed method can find a better dialog policy with fewer policy iterations and 2) the learning speed is comparable with the case where feature selection is conducted in advance.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121621114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the use of phone log-likelihood ratios as features in spoken language recognition 电话日志似然比在口语识别中的应用
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424235
M. Díez, A. Varona, M. Peñagarikano, Luis Javier Rodriguez-Fuentes, Germán Bordel
This paper presents an alternative feature set to the traditional MFCC-SDC used in acoustic approaches to Spoken Language Recognition: the log-likelihood ratios of phone posterior probabilities, hereafter Phone Log-Likelihood Ratios (PLLR), produced by a phone recognizer. In this work, an iVector system trained on this set of features (plus dynamic coefficients) is evaluated and compared to (1) an acoustic iVector system (trained on the MFCC-SDC feature set) and (2) a phonotactic (Phone-lattice-SVM) system, using two different benchmarks: the NIST 2007 and 2009 LRE datasets. iVector systems trained on PLLR features proved to be competitive, reaching or even outperforming the MFCC-SDC-based iVector and the phonotactic systems. The fusion of the proposed approach with the acoustic and phonotactic systems provided even more significant improvements, outperforming state-of-the-art systems on both benchmarks.
本文提出了一种用于语音识别声学方法的传统MFCC-SDC的替代特征集:电话后验概率的对数似然比,以下简称电话对数似然比(PLLR),由电话识别器产生。在这项工作中,使用两个不同的基准:NIST 2007和2009 LRE数据集,评估了在这组特征(加上动态系数)上训练的矢量系统,并将其与(1)声学矢量系统(在MFCC-SDC特征集上训练)和(2)语音定向(电话格- svm)系统进行了比较。经过PLLR特征训练的矢量系统被证明具有竞争力,达到甚至超过了基于mfcc - sdc的矢量和声致化系统。所提出的方法与声学和声致音系统的融合提供了更显着的改进,在两个基准上都优于最先进的系统。
{"title":"On the use of phone log-likelihood ratios as features in spoken language recognition","authors":"M. Díez, A. Varona, M. Peñagarikano, Luis Javier Rodriguez-Fuentes, Germán Bordel","doi":"10.1109/SLT.2012.6424235","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424235","url":null,"abstract":"This paper presents an alternative feature set to the traditional MFCC-SDC used in acoustic approaches to Spoken Language Recognition: the log-likelihood ratios of phone posterior probabilities, hereafter Phone Log-Likelihood Ratios (PLLR), produced by a phone recognizer. In this work, an iVector system trained on this set of features (plus dynamic coefficients) is evaluated and compared to (1) an acoustic iVector system (trained on the MFCC-SDC feature set) and (2) a phonotactic (Phone-lattice-SVM) system, using two different benchmarks: the NIST 2007 and 2009 LRE datasets. iVector systems trained on PLLR features proved to be competitive, reaching or even outperforming the MFCC-SDC-based iVector and the phonotactic systems. The fusion of the proposed approach with the acoustic and phonotactic systems provided even more significant improvements, outperforming state-of-the-art systems on both benchmarks.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126215253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
期刊
2012 IEEE Spoken Language Technology Workshop (SLT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1