首页 > 最新文献

2012 IEEE Spoken Language Technology Workshop (SLT)最新文献

英文 中文
Reinforcement learning for spoken dialogue systems using off-policy natural gradient method 基于非策略自然梯度方法的口语对话系统强化学习
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424161
Filip Jurcícek
Reinforcement learning methods have been successfully used to optimise dialogue strategies in statistical dialogue systems. Typically, reinforcement techniques learn on-policy i.e., the dialogue strategy is updated online while the system is interacting with a user. An alternative to this approach is off-policy reinforcement learning, which estimates an optimal dialogue strategy offline from a fixed corpus of previously collected dialogues. This paper proposes a novel off-policy reinforcement learning method based on natural policy gradients and importance sampling. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments indicate that the proposed method learns a dialogue strategy, which significantly outperforms the baseline handcrafted dialogue policy.
强化学习方法已成功用于统计对话系统中的对话策略优化。通常,强化技术在策略上学习,即,当系统与用户交互时在线更新对话策略。这种方法的另一种替代方法是off-policy强化学习,它从先前收集的固定对话语料库中离线估计最佳对话策略。提出了一种基于自然策略梯度和重要抽样的非策略强化学习方法。在旅游信息领域的口语对话系统上对该算法进行了评价。实验表明,该方法学习了一种对话策略,显著优于基线手工制作的对话策略。
{"title":"Reinforcement learning for spoken dialogue systems using off-policy natural gradient method","authors":"Filip Jurcícek","doi":"10.1109/SLT.2012.6424161","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424161","url":null,"abstract":"Reinforcement learning methods have been successfully used to optimise dialogue strategies in statistical dialogue systems. Typically, reinforcement techniques learn on-policy i.e., the dialogue strategy is updated online while the system is interacting with a user. An alternative to this approach is off-policy reinforcement learning, which estimates an optimal dialogue strategy offline from a fixed corpus of previously collected dialogues. This paper proposes a novel off-policy reinforcement learning method based on natural policy gradients and importance sampling. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments indicate that the proposed method learns a dialogue strategy, which significantly outperforms the baseline handcrafted dialogue policy.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127388522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Employing boosting to compare cues to verbal feedback in multi-lingual dialog 在多语言对话中运用促进法比较线索与口头反馈
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424199
Gina-Anne Levow, Siwei Wang
Verbal feedback provides important cues in establishing interactional rapport. The challenge of recognizing contexts for verbal feedback largely arises from relative sparseness and optionality. In addition, cross-language and inter-speaker variations can make recognition more difficult. In this paper, we show that boosting can improve accuracy in recognizing contexts for verbal feedback based on prosodic cues. In our experiments, we use dyads from three languages (English, Spanish and Arabic) to evaluate two boosting methods, generalized Adaboost and Gradient Boosting Trees, against Support Vector Machines (SVMs) and a naive baseline, with explicit oversampling on the minority verbal feedback instances. We find that both boosting methods outperform the baseline and SVM classifiers. Analysis of the feature weighting by the boosted classifiers highlights differences and similarities in the prosodic cues employed by members of these diverse language/cultural groups.
口头反馈为建立互动关系提供了重要线索。识别口头反馈上下文的挑战主要来自相对稀疏性和可选性。此外,跨语言和说话者之间的差异会使识别变得更加困难。在本文中,我们证明了增强可以提高基于韵律线索的口头反馈识别上下文的准确性。在我们的实验中,我们使用来自三种语言(英语,西班牙语和阿拉伯语)的二元组来评估两种增强方法,广义Adaboost和梯度增强树,针对支持向量机(svm)和朴素基线,对少数口头反馈实例进行显式过采样。我们发现两种增强方法都优于基线和支持向量机分类器。通过增强分类器对特征权重的分析,突出了这些不同语言/文化群体成员所使用的韵律线索的异同。
{"title":"Employing boosting to compare cues to verbal feedback in multi-lingual dialog","authors":"Gina-Anne Levow, Siwei Wang","doi":"10.1109/SLT.2012.6424199","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424199","url":null,"abstract":"Verbal feedback provides important cues in establishing interactional rapport. The challenge of recognizing contexts for verbal feedback largely arises from relative sparseness and optionality. In addition, cross-language and inter-speaker variations can make recognition more difficult. In this paper, we show that boosting can improve accuracy in recognizing contexts for verbal feedback based on prosodic cues. In our experiments, we use dyads from three languages (English, Spanish and Arabic) to evaluate two boosting methods, generalized Adaboost and Gradient Boosting Trees, against Support Vector Machines (SVMs) and a naive baseline, with explicit oversampling on the minority verbal feedback instances. We find that both boosting methods outperform the baseline and SVM classifiers. Analysis of the feature weighting by the boosted classifiers highlights differences and similarities in the prosodic cues employed by members of these diverse language/cultural groups.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130559027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting loudness dynamics in stochastic models of turn-taking 利用轮替随机模型中的响度动力学
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424201
K. Laskowski
Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.
随机轮取模型传统上被实现为n图,它对最近的二值语音/非语音轮廓进行预测。目前的工作使用前馈神经网络重新实现了这个功能,能够接受二进制和连续值特征;随着模型复杂度的增加,性能逐渐接近N-gram基线的性能。然后将条件反射上下文扩展到利用响度轮廓。实验表明,对响度的额外灵敏度大大降低了未见数据的平均交叉熵率,每帧间隔为100 ms降低0.03比特。这种减少被证明使对噪音敏感的熟悉者能够更好地预测,与对噪音不敏感的基线相比,对注意力记忆的要求至少减少了5倍,反应延迟至少缩短了10倍。
{"title":"Exploiting loudness dynamics in stochastic models of turn-taking","authors":"K. Laskowski","doi":"10.1109/SLT.2012.6424201","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424201","url":null,"abstract":"Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131381917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards a new speech event detection approach for landmark-based speech recognition 基于标记的语音识别中语音事件检测方法的研究
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424247
Stefan Ziegler, Bogdan Ludusan, G. Gravier
In this work, we present a new approach for the classification and detection of speech units for the use in landmark or event-based speech recognition systems. We use segmentation to model any time-variable speech unit by a fixed-dimensional observation vector, in order to train a committee of boosted decision stumps on labeled training data. Given an unknown speech signal, the presence of a desired speech unit is estimated by searching for each time frame the corresponding segment, that provides the maximum classification score. This approach improves the accuracy of a phoneme classification task by 1.7%, compared to classification using HMMs. Applying this approach to the detection of broad phonetic landmarks inside a landmark-driven HMM-based speech recognizer significantly improves speech recognition.
在这项工作中,我们提出了一种用于语音单元分类和检测的新方法,用于基于地标或事件的语音识别系统。我们使用一个固定维的观察向量来分割任何时变语音单元,以便在标记的训练数据上训练一组增强的决策残桩。给定未知语音信号,通过在每个时间框架中搜索相应的片段来估计所需语音单元的存在,从而提供最大的分类分数。与使用hmm分类相比,该方法将音素分类任务的准确率提高了1.7%。将该方法应用于基于标记驱动的基于hmm的语音识别器中广泛语音标记的检测,显著提高了语音识别。
{"title":"Towards a new speech event detection approach for landmark-based speech recognition","authors":"Stefan Ziegler, Bogdan Ludusan, G. Gravier","doi":"10.1109/SLT.2012.6424247","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424247","url":null,"abstract":"In this work, we present a new approach for the classification and detection of speech units for the use in landmark or event-based speech recognition systems. We use segmentation to model any time-variable speech unit by a fixed-dimensional observation vector, in order to train a committee of boosted decision stumps on labeled training data. Given an unknown speech signal, the presence of a desired speech unit is estimated by searching for each time frame the corresponding segment, that provides the maximum classification score. This approach improves the accuracy of a phoneme classification task by 1.7%, compared to classification using HMMs. Applying this approach to the detection of broad phonetic landmarks inside a landmark-driven HMM-based speech recognizer significantly improves speech recognition.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123344485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analysis of speech transcripts to predict winners of U.S. Presidential and Vice-Presidential debates 分析演讲文稿,预测美国总统和副总统辩论的获胜者
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424266
Ian Kaplan, Andrew Rosenberg
In this paper, we describe investigations into the speech used in American Presidential and Vice-Presidential debates. We explore possible transcript-based features that may correlate with personally appealing or politically persuasive language. We identify, with chi-squared analysis, features that correlate with success in the debates. We find that with a set of surface-level features from historical debates, we can predict the winners of presidential debates with success moderately above chance.
在本文中,我们描述了对美国总统和副总统辩论中所用演讲的调查。我们探索可能与个人吸引力或政治说服力语言相关的基于转录的特征。通过卡方分析,我们确定了与辩论成功相关的特征。我们发现,通过历史辩论的一系列表面特征,我们可以预测总统辩论的赢家,成功率略高于概率。
{"title":"Analysis of speech transcripts to predict winners of U.S. Presidential and Vice-Presidential debates","authors":"Ian Kaplan, Andrew Rosenberg","doi":"10.1109/SLT.2012.6424266","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424266","url":null,"abstract":"In this paper, we describe investigations into the speech used in American Presidential and Vice-Presidential debates. We explore possible transcript-based features that may correlate with personally appealing or politically persuasive language. We identify, with chi-squared analysis, features that correlate with success in the debates. We find that with a set of surface-level features from historical debates, we can predict the winners of presidential debates with success moderately above chance.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122571775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Discriminative spoken language understanding using word confusion networks 使用词语混淆网络的辨别性口语理解
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424218
Matthew Henderson, Milica Gasic, Blaise Thomson, P. Tsiakoulis, Kai Yu, S. Young
Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output.
目前的商业对话系统通常使用手工编写的口语理解语法(SLU),它根据语音识别器输出的最上面的一两个假设进行操作。这些系统的开发成本很高,而且当遇到识别错误时,它们的性能会显著下降。本文提出了一种鲁棒的SLU方法,该方法基于以混淆词网络形式编码的识别假设的完全后验分布提取的特征。根据[1],系统使用运行在n-gram特征上的SVM分类器,在未对齐的输入/输出对上进行训练。性能在离线语料库和在线实时用户试用中进行评估。研究表明,基于全后验ASR输出分布的SLU统计判别方法可以在准确性和总体对话奖励方面显著提高性能。此外,通过结合以前系统输出的特征可以获得额外的增益。
{"title":"Discriminative spoken language understanding using word confusion networks","authors":"Matthew Henderson, Milica Gasic, Blaise Thomson, P. Tsiakoulis, Kai Yu, S. Young","doi":"10.1109/SLT.2012.6424218","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424218","url":null,"abstract":"Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130603733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
The Bavieca open-source speech recognition toolkit 巴维埃卡开源语音识别工具包
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424249
Daniel Bolaños
This article describes the design of Bavieca, an open-source speech recognition toolkit intended for speech research and system development. The toolkit supports lattice-based discriminative training, wide phonetic-context, efficient acoustic scoring, large n-gram language models, and the most common feature and model transformations. Bavieca is written entirely in C++ and presents a simple and modular design with an emphasis on scalability and reusability. Bavieca achieves competitive results in standard benchmarks. The toolkit is distributed under the highly unrestricted Apache 2.0 license, and is freely available on SourceForge.
本文介绍了用于语音研究和系统开发的开源语音识别工具包Bavieca的设计。该工具包支持基于格的判别训练、广泛的语音上下文、有效的声学评分、大型n-gram语言模型以及最常见的特征和模型转换。Bavieca完全是用c++编写的,提供了一个简单和模块化的设计,强调可伸缩性和可重用性。巴维埃卡在标准基准测试中取得了具有竞争力的成绩。该工具包在高度不受限制的Apache 2.0许可下发布,并且可以在SourceForge上免费获得。
{"title":"The Bavieca open-source speech recognition toolkit","authors":"Daniel Bolaños","doi":"10.1109/SLT.2012.6424249","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424249","url":null,"abstract":"This article describes the design of Bavieca, an open-source speech recognition toolkit intended for speech research and system development. The toolkit supports lattice-based discriminative training, wide phonetic-context, efficient acoustic scoring, large n-gram language models, and the most common feature and model transformations. Bavieca is written entirely in C++ and presents a simple and modular design with an emphasis on scalability and reusability. Bavieca achieves competitive results in standard benchmarks. The toolkit is distributed under the highly unrestricted Apache 2.0 license, and is freely available on SourceForge.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion 基于混合核和阈值融合的多类支持向量机语音情感分类
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424267
Na Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman, Melissa L. Sturge‐Apple
Emotion classification is essential for understanding human interactions and hence is a vital component of behavioral studies. Although numerous algorithms have been developed, the emotion classification accuracy is still short of what is desired for the algorithms to be used in real systems. In this paper, we evaluate an approach where basic acoustic features are extracted from speech samples, and the One-Against-All (OAA) Support Vector Machine (SVM) learning algorithm is used. We use a novel hybrid kernel, where we choose the optimal kernel functions for the individual OAA classifiers. Outputs from the OAA classifiers are normalized and combined using a thresholding fusion mechanism to finally classify the emotion. Samples with low `relative confidence' are left as `unclassified' to further improve the classification accuracy. Results show that the decision-level recall of our approach for six-class emotion classification is 80.5%, outperforming a state-of-the-art approach that uses the same dataset.
情绪分类对于理解人类互动至关重要,因此也是行为研究的重要组成部分。虽然已经开发了许多算法,但情感分类的精度仍然不能满足算法在实际系统中使用的要求。在本文中,我们评估了一种从语音样本中提取基本声学特征的方法,该方法使用了一对全(OAA)支持向量机(SVM)学习算法。我们使用了一种新的混合核,其中我们为单个OAA分类器选择最优的核函数。OAA分类器的输出被归一化,并使用阈值融合机制进行组合,最终对情感进行分类。“相对置信度”较低的样本被保留为“未分类”,以进一步提高分类精度。结果表明,我们的方法对六类情绪分类的决策级召回率为80.5%,优于使用相同数据集的最先进方法。
{"title":"Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion","authors":"Na Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman, Melissa L. Sturge‐Apple","doi":"10.1109/SLT.2012.6424267","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424267","url":null,"abstract":"Emotion classification is essential for understanding human interactions and hence is a vital component of behavioral studies. Although numerous algorithms have been developed, the emotion classification accuracy is still short of what is desired for the algorithms to be used in real systems. In this paper, we evaluate an approach where basic acoustic features are extracted from speech samples, and the One-Against-All (OAA) Support Vector Machine (SVM) learning algorithm is used. We use a novel hybrid kernel, where we choose the optimal kernel functions for the individual OAA classifiers. Outputs from the OAA classifiers are normalized and combined using a thresholding fusion mechanism to finally classify the emotion. Samples with low `relative confidence' are left as `unclassified' to further improve the classification accuracy. Results show that the decision-level recall of our approach for six-class emotion classification is 80.5%, outperforming a state-of-the-art approach that uses the same dataset.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115376629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Policy optimisation of POMDP-based dialogue systems without state space compression 无状态空间压缩的基于pomdp的对话系统策略优化
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424165
Milica Gasic, Matthew Henderson, Blaise Thomson, P. Tsiakoulis, S. Young
The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance.
部分可观察马尔可夫决策过程(POMDP)作为一种对话模型被提出,它能够自动改进对话策略和对语音理解错误的鲁棒性。然而,这需要大量的对话来训练对话政策。高斯过程(GP)最近被应用于POMDP对话管理优化,显示出大大提高学习速度的能力。在这里,我们使用对话状态对话管理器的贝叶斯更新来进一步研究这个问题。我们证明了直接将高斯过程应用于信念状态是可能的,从而消除了对参数策略表示的需要。此外,生成的策略在保持操作性能的同时学习速度显著加快。
{"title":"Policy optimisation of POMDP-based dialogue systems without state space compression","authors":"Milica Gasic, Matthew Henderson, Blaise Thomson, P. Tsiakoulis, S. Young","doi":"10.1109/SLT.2012.6424165","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424165","url":null,"abstract":"The partially observable Markov decision process (POMDP) has been proposed as a dialogue model that enables automatic improvement of the dialogue policy and robustness to speech understanding errors. It requires, however, a large number of dialogues to train the dialogue policy. Gaussian processes (GP) have recently been applied to POMDP dialogue management optimisation showing an ability to substantially increase the speed of learning. Here, we investigate this further using the Bayesian Update of Dialogue State dialogue manager. We show that it is possible to apply Gaussian processes directly to the belief state, removing the need for a parametric policy representation. In addition, the resulting policy learns significantly faster while maintaining operational performance.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116297449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Automatic transcription of academic lectures from diverse disciplines 自动转录来自不同学科的学术讲座
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424257
Ghada Alharbi, Thomas Hain
In a multimedia world it is now common to record professional presentations, on video or with audio only. Such recordings include talks and academic lectures, which are becoming a valuable resource for students and professionals alike. However, organising such material from a diverse set of disciplines seems to be not an easy task. One way to address this problem is to build an Automatic Speech Recognition (ASR) system in order to use its output for analysing such materials. In this work ASR results for lectures from diverse sources are presented. The work is based on a new collection of data, obtained by the Liberated Learning Consortium (LLC). The study's primary goals are two-fold: first to show variability across disciplines from an ASR perspective, and how to choose sources for the construction of language models (LMs); second, to provide an analysis of the lecture transcription for automatic determination of structures in lecture discourse. In particular, we investigate whether there are properties common to lectures from different disciplines. This study focuses on textual features. Lectures are multimodal experiences - it is not clear whether textual features alone are sufficient for the recognition of such common elements, or other features, e.g. acoustic features such as the speaking rate, are needed. The results show that such common properties are retained across disciplines even on ASR output with a Word Error Rate (WER) of 30%.
在一个多媒体的世界里,现在录制专业的演示文稿是很常见的,无论是视频还是音频。这些录音包括谈话和学术讲座,它们正成为学生和专业人士的宝贵资源。然而,组织这些来自不同学科的材料似乎不是一件容易的事。解决这个问题的一种方法是建立一个自动语音识别(ASR)系统,以便使用其输出来分析这些材料。在这项工作中,ASR结果的讲座从不同的来源提出。这项工作是基于由自由学习联盟(LLC)获得的一组新数据。该研究的主要目标有两个:首先,从ASR的角度显示跨学科的可变性,以及如何选择构建语言模型(LMs)的来源;第二,为讲座语篇结构的自动确定提供讲座转录分析。特别是,我们调查了不同学科的讲座是否有共同的性质。本文主要研究文本特征。讲座是一种多模态的体验——目前尚不清楚是否仅靠文本特征就足以识别这些共同元素,还是需要其他特征,例如语速等声学特征。结果表明,即使在单词错误率(WER)为30%的ASR输出上,这些共同属性仍然保留在各个学科上。
{"title":"Automatic transcription of academic lectures from diverse disciplines","authors":"Ghada Alharbi, Thomas Hain","doi":"10.1109/SLT.2012.6424257","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424257","url":null,"abstract":"In a multimedia world it is now common to record professional presentations, on video or with audio only. Such recordings include talks and academic lectures, which are becoming a valuable resource for students and professionals alike. However, organising such material from a diverse set of disciplines seems to be not an easy task. One way to address this problem is to build an Automatic Speech Recognition (ASR) system in order to use its output for analysing such materials. In this work ASR results for lectures from diverse sources are presented. The work is based on a new collection of data, obtained by the Liberated Learning Consortium (LLC). The study's primary goals are two-fold: first to show variability across disciplines from an ASR perspective, and how to choose sources for the construction of language models (LMs); second, to provide an analysis of the lecture transcription for automatic determination of structures in lecture discourse. In particular, we investigate whether there are properties common to lectures from different disciplines. This study focuses on textual features. Lectures are multimodal experiences - it is not clear whether textual features alone are sufficient for the recognition of such common elements, or other features, e.g. acoustic features such as the speaking rate, are needed. The results show that such common properties are retained across disciplines even on ASR output with a Word Error Rate (WER) of 30%.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126434240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 IEEE Spoken Language Technology Workshop (SLT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1