首页 > 最新文献

2012 IEEE Spoken Language Technology Workshop (SLT)最新文献

英文 中文
Reinforcement learning for spoken dialogue systems using off-policy natural gradient method 基于非策略自然梯度方法的口语对话系统强化学习
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424161
Filip Jurcícek
Reinforcement learning methods have been successfully used to optimise dialogue strategies in statistical dialogue systems. Typically, reinforcement techniques learn on-policy i.e., the dialogue strategy is updated online while the system is interacting with a user. An alternative to this approach is off-policy reinforcement learning, which estimates an optimal dialogue strategy offline from a fixed corpus of previously collected dialogues. This paper proposes a novel off-policy reinforcement learning method based on natural policy gradients and importance sampling. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments indicate that the proposed method learns a dialogue strategy, which significantly outperforms the baseline handcrafted dialogue policy.
强化学习方法已成功用于统计对话系统中的对话策略优化。通常,强化技术在策略上学习,即,当系统与用户交互时在线更新对话策略。这种方法的另一种替代方法是off-policy强化学习,它从先前收集的固定对话语料库中离线估计最佳对话策略。提出了一种基于自然策略梯度和重要抽样的非策略强化学习方法。在旅游信息领域的口语对话系统上对该算法进行了评价。实验表明,该方法学习了一种对话策略,显著优于基线手工制作的对话策略。
{"title":"Reinforcement learning for spoken dialogue systems using off-policy natural gradient method","authors":"Filip Jurcícek","doi":"10.1109/SLT.2012.6424161","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424161","url":null,"abstract":"Reinforcement learning methods have been successfully used to optimise dialogue strategies in statistical dialogue systems. Typically, reinforcement techniques learn on-policy i.e., the dialogue strategy is updated online while the system is interacting with a user. An alternative to this approach is off-policy reinforcement learning, which estimates an optimal dialogue strategy offline from a fixed corpus of previously collected dialogues. This paper proposes a novel off-policy reinforcement learning method based on natural policy gradients and importance sampling. The algorithm is evaluated on a spoken dialogue system in the tourist information domain. The experiments indicate that the proposed method learns a dialogue strategy, which significantly outperforms the baseline handcrafted dialogue policy.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127388522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Employing boosting to compare cues to verbal feedback in multi-lingual dialog 在多语言对话中运用促进法比较线索与口头反馈
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424199
Gina-Anne Levow, Siwei Wang
Verbal feedback provides important cues in establishing interactional rapport. The challenge of recognizing contexts for verbal feedback largely arises from relative sparseness and optionality. In addition, cross-language and inter-speaker variations can make recognition more difficult. In this paper, we show that boosting can improve accuracy in recognizing contexts for verbal feedback based on prosodic cues. In our experiments, we use dyads from three languages (English, Spanish and Arabic) to evaluate two boosting methods, generalized Adaboost and Gradient Boosting Trees, against Support Vector Machines (SVMs) and a naive baseline, with explicit oversampling on the minority verbal feedback instances. We find that both boosting methods outperform the baseline and SVM classifiers. Analysis of the feature weighting by the boosted classifiers highlights differences and similarities in the prosodic cues employed by members of these diverse language/cultural groups.
口头反馈为建立互动关系提供了重要线索。识别口头反馈上下文的挑战主要来自相对稀疏性和可选性。此外,跨语言和说话者之间的差异会使识别变得更加困难。在本文中,我们证明了增强可以提高基于韵律线索的口头反馈识别上下文的准确性。在我们的实验中,我们使用来自三种语言(英语,西班牙语和阿拉伯语)的二元组来评估两种增强方法,广义Adaboost和梯度增强树,针对支持向量机(svm)和朴素基线,对少数口头反馈实例进行显式过采样。我们发现两种增强方法都优于基线和支持向量机分类器。通过增强分类器对特征权重的分析,突出了这些不同语言/文化群体成员所使用的韵律线索的异同。
{"title":"Employing boosting to compare cues to verbal feedback in multi-lingual dialog","authors":"Gina-Anne Levow, Siwei Wang","doi":"10.1109/SLT.2012.6424199","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424199","url":null,"abstract":"Verbal feedback provides important cues in establishing interactional rapport. The challenge of recognizing contexts for verbal feedback largely arises from relative sparseness and optionality. In addition, cross-language and inter-speaker variations can make recognition more difficult. In this paper, we show that boosting can improve accuracy in recognizing contexts for verbal feedback based on prosodic cues. In our experiments, we use dyads from three languages (English, Spanish and Arabic) to evaluate two boosting methods, generalized Adaboost and Gradient Boosting Trees, against Support Vector Machines (SVMs) and a naive baseline, with explicit oversampling on the minority verbal feedback instances. We find that both boosting methods outperform the baseline and SVM classifiers. Analysis of the feature weighting by the boosted classifiers highlights differences and similarities in the prosodic cues employed by members of these diverse language/cultural groups.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130559027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting loudness dynamics in stochastic models of turn-taking 利用轮替随机模型中的响度动力学
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424201
K. Laskowski
Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.
随机轮取模型传统上被实现为n图,它对最近的二值语音/非语音轮廓进行预测。目前的工作使用前馈神经网络重新实现了这个功能,能够接受二进制和连续值特征;随着模型复杂度的增加,性能逐渐接近N-gram基线的性能。然后将条件反射上下文扩展到利用响度轮廓。实验表明,对响度的额外灵敏度大大降低了未见数据的平均交叉熵率,每帧间隔为100 ms降低0.03比特。这种减少被证明使对噪音敏感的熟悉者能够更好地预测,与对噪音不敏感的基线相比,对注意力记忆的要求至少减少了5倍,反应延迟至少缩短了10倍。
{"title":"Exploiting loudness dynamics in stochastic models of turn-taking","authors":"K. Laskowski","doi":"10.1109/SLT.2012.6424201","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424201","url":null,"abstract":"Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131381917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards a new speech event detection approach for landmark-based speech recognition 基于标记的语音识别中语音事件检测方法的研究
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424247
Stefan Ziegler, Bogdan Ludusan, G. Gravier
In this work, we present a new approach for the classification and detection of speech units for the use in landmark or event-based speech recognition systems. We use segmentation to model any time-variable speech unit by a fixed-dimensional observation vector, in order to train a committee of boosted decision stumps on labeled training data. Given an unknown speech signal, the presence of a desired speech unit is estimated by searching for each time frame the corresponding segment, that provides the maximum classification score. This approach improves the accuracy of a phoneme classification task by 1.7%, compared to classification using HMMs. Applying this approach to the detection of broad phonetic landmarks inside a landmark-driven HMM-based speech recognizer significantly improves speech recognition.
在这项工作中,我们提出了一种用于语音单元分类和检测的新方法,用于基于地标或事件的语音识别系统。我们使用一个固定维的观察向量来分割任何时变语音单元,以便在标记的训练数据上训练一组增强的决策残桩。给定未知语音信号,通过在每个时间框架中搜索相应的片段来估计所需语音单元的存在,从而提供最大的分类分数。与使用hmm分类相比,该方法将音素分类任务的准确率提高了1.7%。将该方法应用于基于标记驱动的基于hmm的语音识别器中广泛语音标记的检测,显著提高了语音识别。
{"title":"Towards a new speech event detection approach for landmark-based speech recognition","authors":"Stefan Ziegler, Bogdan Ludusan, G. Gravier","doi":"10.1109/SLT.2012.6424247","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424247","url":null,"abstract":"In this work, we present a new approach for the classification and detection of speech units for the use in landmark or event-based speech recognition systems. We use segmentation to model any time-variable speech unit by a fixed-dimensional observation vector, in order to train a committee of boosted decision stumps on labeled training data. Given an unknown speech signal, the presence of a desired speech unit is estimated by searching for each time frame the corresponding segment, that provides the maximum classification score. This approach improves the accuracy of a phoneme classification task by 1.7%, compared to classification using HMMs. Applying this approach to the detection of broad phonetic landmarks inside a landmark-driven HMM-based speech recognizer significantly improves speech recognition.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123344485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Analysis of speech transcripts to predict winners of U.S. Presidential and Vice-Presidential debates 分析演讲文稿,预测美国总统和副总统辩论的获胜者
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424266
Ian Kaplan, Andrew Rosenberg
In this paper, we describe investigations into the speech used in American Presidential and Vice-Presidential debates. We explore possible transcript-based features that may correlate with personally appealing or politically persuasive language. We identify, with chi-squared analysis, features that correlate with success in the debates. We find that with a set of surface-level features from historical debates, we can predict the winners of presidential debates with success moderately above chance.
在本文中,我们描述了对美国总统和副总统辩论中所用演讲的调查。我们探索可能与个人吸引力或政治说服力语言相关的基于转录的特征。通过卡方分析,我们确定了与辩论成功相关的特征。我们发现,通过历史辩论的一系列表面特征,我们可以预测总统辩论的赢家,成功率略高于概率。
{"title":"Analysis of speech transcripts to predict winners of U.S. Presidential and Vice-Presidential debates","authors":"Ian Kaplan, Andrew Rosenberg","doi":"10.1109/SLT.2012.6424266","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424266","url":null,"abstract":"In this paper, we describe investigations into the speech used in American Presidential and Vice-Presidential debates. We explore possible transcript-based features that may correlate with personally appealing or politically persuasive language. We identify, with chi-squared analysis, features that correlate with success in the debates. We find that with a set of surface-level features from historical debates, we can predict the winners of presidential debates with success moderately above chance.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122571775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Bavieca open-source speech recognition toolkit 巴维埃卡开源语音识别工具包
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424249
Daniel Bolaños
This article describes the design of Bavieca, an open-source speech recognition toolkit intended for speech research and system development. The toolkit supports lattice-based discriminative training, wide phonetic-context, efficient acoustic scoring, large n-gram language models, and the most common feature and model transformations. Bavieca is written entirely in C++ and presents a simple and modular design with an emphasis on scalability and reusability. Bavieca achieves competitive results in standard benchmarks. The toolkit is distributed under the highly unrestricted Apache 2.0 license, and is freely available on SourceForge.
本文介绍了用于语音研究和系统开发的开源语音识别工具包Bavieca的设计。该工具包支持基于格的判别训练、广泛的语音上下文、有效的声学评分、大型n-gram语言模型以及最常见的特征和模型转换。Bavieca完全是用c++编写的,提供了一个简单和模块化的设计,强调可伸缩性和可重用性。巴维埃卡在标准基准测试中取得了具有竞争力的成绩。该工具包在高度不受限制的Apache 2.0许可下发布,并且可以在SourceForge上免费获得。
{"title":"The Bavieca open-source speech recognition toolkit","authors":"Daniel Bolaños","doi":"10.1109/SLT.2012.6424249","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424249","url":null,"abstract":"This article describes the design of Bavieca, an open-source speech recognition toolkit intended for speech research and system development. The toolkit supports lattice-based discriminative training, wide phonetic-context, efficient acoustic scoring, large n-gram language models, and the most common feature and model transformations. Bavieca is written entirely in C++ and presents a simple and modular design with an emphasis on scalability and reusability. Bavieca achieves competitive results in standard benchmarks. The toolkit is distributed under the highly unrestricted Apache 2.0 license, and is freely available on SourceForge.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Discriminative spoken language understanding using word confusion networks 使用词语混淆网络的辨别性口语理解
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424218
Matthew Henderson, Milica Gasic, Blaise Thomson, P. Tsiakoulis, Kai Yu, S. Young
Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output.
目前的商业对话系统通常使用手工编写的口语理解语法(SLU),它根据语音识别器输出的最上面的一两个假设进行操作。这些系统的开发成本很高,而且当遇到识别错误时,它们的性能会显著下降。本文提出了一种鲁棒的SLU方法,该方法基于以混淆词网络形式编码的识别假设的完全后验分布提取的特征。根据[1],系统使用运行在n-gram特征上的SVM分类器,在未对齐的输入/输出对上进行训练。性能在离线语料库和在线实时用户试用中进行评估。研究表明,基于全后验ASR输出分布的SLU统计判别方法可以在准确性和总体对话奖励方面显著提高性能。此外,通过结合以前系统输出的特征可以获得额外的增益。
{"title":"Discriminative spoken language understanding using word confusion networks","authors":"Matthew Henderson, Milica Gasic, Blaise Thomson, P. Tsiakoulis, Kai Yu, S. Young","doi":"10.1109/SLT.2012.6424218","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424218","url":null,"abstract":"Current commercial dialogue systems typically use hand-crafted grammars for Spoken Language Understanding (SLU) operating on the top one or two hypotheses output by the speech recogniser. These systems are expensive to develop and they suffer from significant degradation in performance when faced with recognition errors. This paper presents a robust method for SLU based on features extracted from the full posterior distribution of recognition hypotheses encoded in the form of word confusion networks. Following [1], the system uses SVM classifiers operating on n-gram features, trained on unaligned input/output pairs. Performance is evaluated on both an off-line corpus and on-line in a live user trial. It is shown that a statistical discriminative approach to SLU operating on the full posterior ASR output distribution can substantially improve performance both in terms of accuracy and overall dialogue reward. Furthermore, additional gains can be obtained by incorporating features from the previous system output.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130603733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Train&align: A new online tool for automatic phonetic alignment Train&align:一个新的自动语音对齐在线工具
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424260
Sandrine Brognaux, Sophie Roekhaut, Thomas Drugman, Richard Beaufort
Several automatic phonetic alignment tools have been proposed in the literature. They usually rely on pre-trained speaker-independent models to align new corpora. Their drawback is that they cover a very limited number of languages and might not perform properly for different speaking styles. This paper presents a new tool for automatic phonetic alignment available online. Its specificity is that it trains the model directly on the corpus to align, which makes it applicable to any language and speaking style. Experiments on three corpora show that it provides results comparable to other existing tools. It also allows the tuning of some training parameters. The use of tied-state triphones, for example, shows further improvement of about 1.5% for a 20 ms threshold. A manually-aligned part of the corpus can also be used as bootstrap to improve the model quality. Alignment rates were found to significantly increase, up to 20%, using only 30 seconds of bootstrapping data.
文献中提出了几种自动语音对齐工具。他们通常依靠预先训练的独立于说话人的模型来对齐新的语料库。它们的缺点是,它们涵盖的语言数量非常有限,可能不适用于不同的口语风格。本文介绍了一种新的在线语音自动对齐工具。它的特殊之处在于它直接在语料库上训练模型来对齐,这使得它适用于任何语言和说话风格。在三个语料库上的实验表明,该方法的结果与其他现有工具相当。它还允许调整一些训练参数。例如,使用固定状态的三音耳机,在20毫秒的阈值下显示出约1.5%的进一步改善。语料库的手动对齐部分也可以用作引导以提高模型质量。结果发现,仅使用30秒的引导数据,对齐率就显著提高了20%。
{"title":"Train&align: A new online tool for automatic phonetic alignment","authors":"Sandrine Brognaux, Sophie Roekhaut, Thomas Drugman, Richard Beaufort","doi":"10.1109/SLT.2012.6424260","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424260","url":null,"abstract":"Several automatic phonetic alignment tools have been proposed in the literature. They usually rely on pre-trained speaker-independent models to align new corpora. Their drawback is that they cover a very limited number of languages and might not perform properly for different speaking styles. This paper presents a new tool for automatic phonetic alignment available online. Its specificity is that it trains the model directly on the corpus to align, which makes it applicable to any language and speaking style. Experiments on three corpora show that it provides results comparable to other existing tools. It also allows the tuning of some training parameters. The use of tied-state triphones, for example, shows further improvement of about 1.5% for a 20 ms threshold. A manually-aligned part of the corpus can also be used as bootstrap to improve the model quality. Alignment rates were found to significantly increase, up to 20%, using only 30 seconds of bootstrapping data.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"142 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114217865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Intent transfer in speech-to-speech machine translation 语音到语音机器翻译中的意图转移
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424214
G. Anumanchipalli, Luís C. Oliveira, A. Black
This paper presents an approach for transfer of speaker intent in speech-to-speech machine translation (S2SMT). Specifically, we describe techniques to retain the prominence patterns of the source language utterance through the translation pipeline and impose this information during speech synthesis in the target language. We first present an analysis of word focus across languages to motivate the problem of transfer. We then propose an approach for training an appropriate transfer function for intonation on a parallel speech corpus in the two languages within which the translation is carried out. We present our analysis and experiments on English↔Portuguese and English↔German language pairs and evaluate the proposed transformation techniques through objective measures.
本文提出了语音到语音机器翻译(S2SMT)中说话人意图转移的方法。具体来说,我们描述了通过翻译管道保留源语言话语的突出模式,并在目标语言的语音合成过程中施加这些信息的技术。我们首先提出了跨语言的词焦点分析,以激发迁移问题。然后,我们提出了一种方法,在进行翻译的两种语言的平行语音语料库上训练适当的语调传递函数。我们介绍了我们对英语↔葡萄牙语和英语↔德语两种语言对的分析和实验,并通过客观措施评价了所提出的转换技术。
{"title":"Intent transfer in speech-to-speech machine translation","authors":"G. Anumanchipalli, Luís C. Oliveira, A. Black","doi":"10.1109/SLT.2012.6424214","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424214","url":null,"abstract":"This paper presents an approach for transfer of speaker intent in speech-to-speech machine translation (S2SMT). Specifically, we describe techniques to retain the prominence patterns of the source language utterance through the translation pipeline and impose this information during speech synthesis in the target language. We first present an analysis of word focus across languages to motivate the problem of transfer. We then propose an approach for training an appropriate transfer function for intonation on a parallel speech corpus in the two languages within which the translation is carried out. We present our analysis and experiments on English↔Portuguese and English↔German language pairs and evaluate the proposed transformation techniques through objective measures.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124691012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Improved semantic retrieval of spoken content by language models enhanced with acoustic similarity graph 声学相似图增强语言模型对口语内容语义检索的改进
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424219
Hung-yi Lee, Tsung-Hsien Wen, Lin-Shan Lee
Retrieving objects semantically related to the query has been widely studied in text information retrieval. However, when applying the text-based techniques on spoken content, the inevitable recognition errors may seriously degrade the performance. In this paper, we propose to enhance the expected term frequencies estimated from spoken content by acoustic similarity graphs. For each word in the lexicon, a graph is constructed describing acoustic similarity among spoken segments in the archive. Score propagation over the graph helps in estimating the expected term frequencies. The enhanced expected term frequencies can be used in the language modeling retrieval approach, as well as semantic retrieval techniques such as the document expansion based on latent semantic analysis, and query expansion considering both words and latent topic information. Preliminary experiments performed on Mandarin broadcast news indicated that improved performance were achievable under different conditions.
在文本信息检索中,检索与查询语义相关的对象已经得到了广泛的研究。然而,在将基于文本的识别技术应用于口语内容时,不可避免的识别错误会严重降低识别性能。在本文中,我们提出通过声学相似图来提高从语音内容估计的期望词频率。对于词典中的每个单词,构建一个图来描述档案中语音片段之间的声学相似性。在图上的分数传播有助于估计预期的项频率。增强的期望词频率可用于语言建模检索方法,以及基于潜在语义分析的文档扩展、同时考虑词和潜在主题信息的查询扩展等语义检索技术。对普通话广播新闻进行的初步实验表明,在不同的条件下,性能都可以得到提高。
{"title":"Improved semantic retrieval of spoken content by language models enhanced with acoustic similarity graph","authors":"Hung-yi Lee, Tsung-Hsien Wen, Lin-Shan Lee","doi":"10.1109/SLT.2012.6424219","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424219","url":null,"abstract":"Retrieving objects semantically related to the query has been widely studied in text information retrieval. However, when applying the text-based techniques on spoken content, the inevitable recognition errors may seriously degrade the performance. In this paper, we propose to enhance the expected term frequencies estimated from spoken content by acoustic similarity graphs. For each word in the lexicon, a graph is constructed describing acoustic similarity among spoken segments in the archive. Score propagation over the graph helps in estimating the expected term frequencies. The enhanced expected term frequencies can be used in the language modeling retrieval approach, as well as semantic retrieval techniques such as the document expansion based on latent semantic analysis, and query expansion considering both words and latent topic information. Preliminary experiments performed on Mandarin broadcast news indicated that improved performance were achievable under different conditions.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122277832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2012 IEEE Spoken Language Technology Workshop (SLT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1