首页 > 最新文献

2012 IEEE Spoken Language Technology Workshop (SLT)最新文献

英文 中文
Recovery of acronyms, out-of-lattice words and pronunciations from parallel multilingual speech 从平行多语言语音中恢复首字母缩略词、格子外词和发音
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424248
João Miranda, J. Neto, A. Black
In this work we present a set of techniques which explore information from multiple, different language versions of the same speech, to improve Automatic Speech Recognition (ASR) performance. Using this redundant information we are able to recover acronyms, words that cannot be found in the multiple hypotheses produced by the ASR systems, and pronunciations absent from their pronunciation dictionaries. When used together, the three techniques yield a relative improvement of 5.0% over the WER of our baseline system, and 24.8% relative when compared with standard speech recognition, in an Europarl Committee dataset with three different languages (Portuguese, Spanish and English). One full iteration of the system has a parallel Real Time Factor (RTF) of 3.08 and a sequential RTF of 6.44.
在这项工作中,我们提出了一套技术,从同一语音的多个不同语言版本中探索信息,以提高自动语音识别(ASR)的性能。利用这些冗余信息,我们能够恢复首字母缩略词,在ASR系统产生的多个假设中找不到的单词,以及发音字典中没有的发音。当一起使用时,这三种技术比我们的基线系统的相对效率提高了5.0%,与标准语音识别相比,在具有三种不同语言(葡萄牙语、西班牙语和英语)的欧洲平行委员会数据集中,这三种技术的相对效率提高了24.8%。系统的一次完整迭代的并行实时因子(RTF)为3.08,顺序实时因子为6.44。
{"title":"Recovery of acronyms, out-of-lattice words and pronunciations from parallel multilingual speech","authors":"João Miranda, J. Neto, A. Black","doi":"10.1109/SLT.2012.6424248","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424248","url":null,"abstract":"In this work we present a set of techniques which explore information from multiple, different language versions of the same speech, to improve Automatic Speech Recognition (ASR) performance. Using this redundant information we are able to recover acronyms, words that cannot be found in the multiple hypotheses produced by the ASR systems, and pronunciations absent from their pronunciation dictionaries. When used together, the three techniques yield a relative improvement of 5.0% over the WER of our baseline system, and 24.8% relative when compared with standard speech recognition, in an Europarl Committee dataset with three different languages (Portuguese, Spanish and English). One full iteration of the system has a parallel Real Time Factor (RTF) of 3.08 and a sequential RTF of 6.44.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129270522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A critical analysis of two statistical spoken dialog systems in public use 对公共使用的两种统计口语对话系统的批判性分析
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424197
J. Williams
This paper examines two statistical spoken dialog systems deployed to the public, extending an earlier study on one system [1]. Results across the two systems show that statistical techniques improved performance in some cases, but degraded performance in others. Investigating degradations, we find the three main causes are (non-obviously) inaccurate parameter estimates, poor confidence scores, and correlations in speech recognition errors. We also find evidence for fundamental weaknesses in the formulation of the model as a generative process, and briefly show the potential of a discriminatively-trained alternative.
本文研究了两个面向公众的统计口语对话系统,扩展了对一个系统的早期研究[1]。两个系统的结果表明,统计技术在某些情况下提高了性能,但在另一些情况下降低了性能。研究退化,我们发现三个主要原因是(不明显)不准确的参数估计,差的置信度评分和语音识别错误的相关性。我们还发现了作为生成过程的模型公式中的基本弱点的证据,并简要地展示了鉴别训练替代方案的潜力。
{"title":"A critical analysis of two statistical spoken dialog systems in public use","authors":"J. Williams","doi":"10.1109/SLT.2012.6424197","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424197","url":null,"abstract":"This paper examines two statistical spoken dialog systems deployed to the public, extending an earlier study on one system [1]. Results across the two systems show that statistical techniques improved performance in some cases, but degraded performance in others. Investigating degradations, we find the three main causes are (non-obviously) inaccurate parameter estimates, poor confidence scores, and correlations in speech recognition errors. We also find evidence for fundamental weaknesses in the formulation of the model as a generative process, and briefly show the potential of a discriminatively-trained alternative.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132480743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
American sign language fingerspelling recognition with phonological feature-based tandem models 基于语音特征串联模型的美国手语手势语拼写识别
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424208
Taehwan Kim, Karen Livescu, Gregory Shakhnarovich
We study the recognition of fingerspelling sequences in American Sign Language from video using tandem-style models, in which the outputs of multilayer perceptron (MLP) classifiers are used as observations in a hidden Markov model (HMM)-based recognizer. We compare a baseline HMM-based recognizer, a tandem recognizer using MLP letter classifiers, and a tandem recognizer using MLP classifiers of phonological features. We present experiments on a database of fingerspelling videos. We find that the tandem approaches outperform an HMM-based baseline, and that phonological feature-based tandem models outperform letter-based tandem models.
我们使用串联式模型研究了美国手语视频中指纹拼写序列的识别,其中多层感知器(MLP)分类器的输出作为基于隐马尔可夫模型(HMM)的识别器的观察值。我们比较了基于基线hmm的识别器、使用MLP字母分类器的串联识别器和使用MLP语音特征分类器的串联识别器。我们在一个指纹拼写视频数据库上进行实验。我们发现串联方法优于基于hmm的基线,并且基于语音特征的串联模型优于基于字母的串联模型。
{"title":"American sign language fingerspelling recognition with phonological feature-based tandem models","authors":"Taehwan Kim, Karen Livescu, Gregory Shakhnarovich","doi":"10.1109/SLT.2012.6424208","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424208","url":null,"abstract":"We study the recognition of fingerspelling sequences in American Sign Language from video using tandem-style models, in which the outputs of multilayer perceptron (MLP) classifiers are used as observations in a hidden Markov model (HMM)-based recognizer. We compare a baseline HMM-based recognizer, a tandem recognizer using MLP letter classifiers, and a tandem recognizer using MLP classifiers of phonological features. We present experiments on a database of fingerspelling videos. We find that the tandem approaches outperform an HMM-based baseline, and that phonological feature-based tandem models outperform letter-based tandem models.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131385037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Using syntactic and confusion network structure for out-of-vocabulary word detection 利用句法和混淆网络结构进行词汇外词检测
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424215
Alex Marin, T. Kwiatkowski, Mari Ostendorf, Luke Zettlemoyer
This paper addresses the problem of detecting words that are out-of-vocabulary (OOV) for a speech recognition system to improve automatic speech translation. The detection system leverages confidence prediction techniques given a confusion network representation and parsing with OOV word tokens to identify spans associated with true OOV words. Working in a resource-constrained domain, we achieve OOV detection F-scores of 60-66 and reduce word error rate by 12% relative to the case where OOV words are not detected.
本文研究了语音识别系统的词汇外检测问题,以提高语音自动翻译水平。检测系统利用给出混淆网络表示的置信度预测技术,并使用OOV单词令牌进行解析,以识别与真正的OOV单词相关的范围。在资源受限的领域中,我们实现了OOV检测f -得分为60-66,相对于未检测到OOV单词的情况,我们将单词错误率降低了12%。
{"title":"Using syntactic and confusion network structure for out-of-vocabulary word detection","authors":"Alex Marin, T. Kwiatkowski, Mari Ostendorf, Luke Zettlemoyer","doi":"10.1109/SLT.2012.6424215","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424215","url":null,"abstract":"This paper addresses the problem of detecting words that are out-of-vocabulary (OOV) for a speech recognition system to improve automatic speech translation. The detection system leverages confidence prediction techniques given a confusion network representation and parsing with OOV word tokens to identify spans associated with true OOV words. Working in a resource-constrained domain, we achieve OOV detection F-scores of 60-66 and reduce word error rate by 12% relative to the case where OOV words are not detected.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123322676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM 基于CD-DNN-HMM的混合带宽训练数据改进宽带语音识别
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424210
Jinyu Li, Dong Yu, J. Huang, Y. Gong
Context-dependent deep neural network hidden Markov model (CD-DNN-HMM) is a recently proposed acoustic model that significantly outperformed Gaussian mixture model (GMM)-HMM systems in many large vocabulary speech recognition (LVSR) tasks. In this paper we present our strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework. We show that DNNs provide the flexibility of using arbitrary features. By using the Mel-scale log-filter bank features we not only achieve higher recognition accuracy than using MFCCs, but also can formulate the mixed-bandwidth training problem as a missing feature problem, in which several feature dimensions have no value when narrowband speech is presented. This treatment makes training CD-DNN-HMMs with mixed-bandwidth data an easy task since no bandwidth extension is needed. Our experiments on voice search data indicate that the proposed solution not only provides higher recognition accuracy for the wideband speech but also allows the same CD-DNN-HMM to recognize mixed-bandwidth speech. By exploiting mixed-bandwidth training data CD-DNN-HMM outperforms fMPE+BMMI trained GMM-HMM, which cannot benefit from using narrowband data, by 18.4%.
上下文相关的深度神经网络隐马尔可夫模型(CD-DNN-HMM)是最近提出的一种声学模型,在许多大词汇量语音识别(LVSR)任务中显著优于高斯混合模型(GMM)-HMM系统。在本文中,我们提出了在CD-DNN-HMM框架中使用混合带宽训练数据来提高宽带语音识别精度的策略。我们证明dnn提供了使用任意特征的灵活性。通过使用mel尺度的对数滤波器组特征,不仅可以获得比mfc更高的识别精度,而且可以将混合带宽训练问题表述为缺失特征问题,即在窄带语音呈现时,多个特征维度没有值。这种处理使得训练具有混合带宽数据的cd - dnn - hmm成为一项简单的任务,因为不需要带宽扩展。在语音搜索数据上进行的实验表明,该方法不仅对宽带语音具有较高的识别精度,而且可以在相同的CD-DNN-HMM下对混合带宽语音进行识别。通过利用混合带宽训练数据,CD-DNN-HMM比fMPE+BMMI训练的GMM-HMM的性能提高了18.4%,后者无法从窄带数据中获益。
{"title":"Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM","authors":"Jinyu Li, Dong Yu, J. Huang, Y. Gong","doi":"10.1109/SLT.2012.6424210","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424210","url":null,"abstract":"Context-dependent deep neural network hidden Markov model (CD-DNN-HMM) is a recently proposed acoustic model that significantly outperformed Gaussian mixture model (GMM)-HMM systems in many large vocabulary speech recognition (LVSR) tasks. In this paper we present our strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework. We show that DNNs provide the flexibility of using arbitrary features. By using the Mel-scale log-filter bank features we not only achieve higher recognition accuracy than using MFCCs, but also can formulate the mixed-bandwidth training problem as a missing feature problem, in which several feature dimensions have no value when narrowband speech is presented. This treatment makes training CD-DNN-HMMs with mixed-bandwidth data an easy task since no bandwidth extension is needed. Our experiments on voice search data indicate that the proposed solution not only provides higher recognition accuracy for the wideband speech but also allows the same CD-DNN-HMM to recognize mixed-bandwidth speech. By exploiting mixed-bandwidth training data CD-DNN-HMM outperforms fMPE+BMMI trained GMM-HMM, which cannot benefit from using narrowband data, by 18.4%.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127735459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 140
Noisy channel adaptation in language identification 语言识别中的噪声信道适应
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424241
Sriram Ganapathy, M. Omar, Jason W. Pelecanos
Language identification (LID) of speech data recorded over noisy communication channels is a challenging problem especially when the LID system is tested on speech data from an unseen communication channel (not seen in training). In this paper, we consider the scenario in which a small amount of adaptation data is available from a new communication channel. Various approaches are investigated for efficient utilization of the adaptation data in a supervised as well as unsupervised setting. In a supervised adaptation framework, we show that support vector machines (SVMs) with higher order polynomial kernels (HO-SVM) trained using lower dimensional representations of the the Gaussian mixture model supervectors (GSVs) provide significant performance improvements over the baseline SVM-GSV system. In these LID experiments, we obtain 30% reduction in error-rate with 6 hours of adaptation data for a new channel. For unsupervised adaptation, we develop an iterative procedure for re-labeling the development data using a co-training framework. In these experiments, we obtain considerable improvements(relative improvements of 13 %) over a self-training framework with the HO-SVM models.
语言识别(LID)是一个具有挑战性的问题,特别是当LID系统测试来自未知通信信道的语音数据时(在训练中未见过)。在本文中,我们考虑了从一个新的通信通道中获得少量自适应数据的场景。研究了在有监督和无监督环境下有效利用自适应数据的各种方法。在监督自适应框架中,我们发现使用高斯混合模型超向量(GSVs)的低维表示训练高阶多项式核(HO-SVM)的支持向量机(svm)比基线SVM-GSV系统提供了显着的性能改进。在这些LID实验中,我们用6小时的新信道自适应数据将错误率降低了30%。对于无监督适应,我们开发了一个迭代过程,使用共同训练框架重新标记开发数据。在这些实验中,与使用HO-SVM模型的自训练框架相比,我们获得了相当大的改进(相对改进13%)。
{"title":"Noisy channel adaptation in language identification","authors":"Sriram Ganapathy, M. Omar, Jason W. Pelecanos","doi":"10.1109/SLT.2012.6424241","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424241","url":null,"abstract":"Language identification (LID) of speech data recorded over noisy communication channels is a challenging problem especially when the LID system is tested on speech data from an unseen communication channel (not seen in training). In this paper, we consider the scenario in which a small amount of adaptation data is available from a new communication channel. Various approaches are investigated for efficient utilization of the adaptation data in a supervised as well as unsupervised setting. In a supervised adaptation framework, we show that support vector machines (SVMs) with higher order polynomial kernels (HO-SVM) trained using lower dimensional representations of the the Gaussian mixture model supervectors (GSVs) provide significant performance improvements over the baseline SVM-GSV system. In these LID experiments, we obtain 30% reduction in error-rate with 6 hours of adaptation data for a new channel. For unsupervised adaptation, we develop an iterative procedure for re-labeling the development data using a co-training framework. In these experiments, we obtain considerable improvements(relative improvements of 13 %) over a self-training framework with the HO-SVM models.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127871347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exemplar-based voice conversion in noisy environment 噪声环境下基于样本的语音转换
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424242
R. Takashima, T. Takiguchi, Y. Ariki
This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.
本文提出了一种噪声环境下的语音转换技术,该技术采用并行示例对源语音信号进行编码,并对目标语音信号进行合成。平行范例(词典)由源范例和目标范例组成,源范例和目标范例具有相同的文本。将输入源信号分解为源样例、从输入信号中得到的噪声样例及其权值(活动)。然后,利用源样例的权重,从目标样例构造转换后的信号。我们使用干净的语音数据和添加噪声的语音数据进行说话人转换任务。通过与基于高斯混合模型(GMM)的传统方法的有效性比较,验证了该方法的有效性。
{"title":"Exemplar-based voice conversion in noisy environment","authors":"R. Takashima, T. Takiguchi, Y. Ariki","doi":"10.1109/SLT.2012.6424242","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424242","url":null,"abstract":"This paper presents a voice conversion (VC) technique for noisy environments, where parallel exemplars are introduced to encode the source speech signal and synthesize the target speech signal. The parallel exemplars (dictionary) consist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The input source signal is decomposed into the source exemplars, noise exemplars obtained from the input signal, and their weights (activities). Then, by using the weights of the source exemplars, the converted signal is constructed from the target exemplars. We carried out speaker conversion tasks using clean speech data and noise-added speech data. The effectiveness of this method was confirmed by comparing its effectiveness with that of a conventional Gaussian Mixture Model (GMM)-based method.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"48 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120909672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 139
Evaluating the effect of normalizing informal text on TTS output 评估规范化非正式文本对TTS输出的影响
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424271
Deana Pennell, Yang Liu
Abbreviations in informal text, and research efforts to expand them to the standard English words from which they were derived, have become increasingly common. These methods are almost solely evaluated using the final word error rate (WER) after normalization; however, this metric may not be reasonable for a text-to-speech (TTS) system where words may be pronounced correctly despite being misspelled. This paper shows that normalization of informal text improves the output of TTS not only in terms of WER but also in terms of phoneme error rate (PER) and human perceptual experiments.
非正式文本中的缩写,以及将其扩展为标准英语单词的研究工作,已经变得越来越普遍。这些方法几乎只使用归一化后的最终单词错误率(WER)进行评估;然而,对于文本到语音(TTS)系统,这个度量可能不合理,因为在TTS系统中,单词可能会被正确发音,尽管拼写错误。本文表明,非正式文本的规范化不仅在WER方面提高了TTS输出,而且在音素错误率(PER)和人类感知实验方面也提高了TTS输出。
{"title":"Evaluating the effect of normalizing informal text on TTS output","authors":"Deana Pennell, Yang Liu","doi":"10.1109/SLT.2012.6424271","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424271","url":null,"abstract":"Abbreviations in informal text, and research efforts to expand them to the standard English words from which they were derived, have become increasingly common. These methods are almost solely evaluated using the final word error rate (WER) after normalization; however, this metric may not be reasonable for a text-to-speech (TTS) system where words may be pronounced correctly despite being misspelled. This paper shows that normalization of informal text improves the output of TTS not only in terms of WER but also in terms of phoneme error rate (PER) and human perceptual experiments.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116980892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Context dependent recurrent neural network language model 上下文相关的递归神经网络语言模型
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424228
Tomas Mikolov, G. Zweig
Recurrent neural network language models (RNNLMs) have recently demonstrated state-of-the-art performance across a variety of tasks. In this paper, we improve their performance by providing a contextual real-valued input vector in association with each word. This vector is used to convey contextual information about the sentence being modeled. By performing Latent Dirichlet Allocation using a block of preceding text, we achieve a topic-conditioned RNNLM. This approach has the key advantage of avoiding the data fragmentation associated with building multiple topic models on different data subsets. We report perplexity results on the Penn Treebank data, where we achieve a new state-of-the-art. We further apply the model to the Wall Street Journal speech recognition task, where we observe improvements in word-error-rate.
递归神经网络语言模型(rnnlm)最近在各种任务中表现出了最先进的性能。在本文中,我们通过提供与每个单词关联的上下文实值输入向量来提高它们的性能。这个向量用于传递被建模句子的上下文信息。通过使用前面的文本块执行潜在狄利克雷分配,我们实现了主题条件的RNNLM。这种方法的主要优点是避免了在不同的数据子集上构建多个主题模型所带来的数据碎片。我们在宾州树库数据上报告困惑结果,在那里我们实现了新的最先进的技术。我们进一步将该模型应用于《华尔街日报》的语音识别任务,在那里我们观察到单词错误率的改善。
{"title":"Context dependent recurrent neural network language model","authors":"Tomas Mikolov, G. Zweig","doi":"10.1109/SLT.2012.6424228","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424228","url":null,"abstract":"Recurrent neural network language models (RNNLMs) have recently demonstrated state-of-the-art performance across a variety of tasks. In this paper, we improve their performance by providing a contextual real-valued input vector in association with each word. This vector is used to convey contextual information about the sentence being modeled. By performing Latent Dirichlet Allocation using a block of preceding text, we achieve a topic-conditioned RNNLM. This approach has the key advantage of avoiding the data fragmentation associated with building multiple topic models on different data subsets. We report perplexity results on the Penn Treebank data, where we achieve a new state-of-the-art. We further apply the model to the Wall Street Journal speech recognition task, where we observe improvements in word-error-rate.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116447943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 592
Statistical semantic interpretation modeling for spoken language understanding with enriched semantic features 面向语义特征丰富的口语理解的统计语义解释建模
Pub Date : 2012-12-01 DOI: 10.1109/SLT.2012.6424225
Asli Celikyilmaz, Dilek Z. Hakkani-Tür, Gökhan Tür
In natural language human-machine statistical dialog systems, semantic interpretation is a key task typically performed following semantic parsing, and aims to extract canonical meaning representations of semantic components. In the literature, usually manually built rules are used for this task, even for implicitly mentioned non-named semantic components (like genre of a movie or price range of a restaurant). In this study, we present statistical methods for modeling interpretation, which can also benefit from semantic features extracted from large in-domain knowledge sources. We extract features from user utterances using a semantic parser and additional semantic features from textual sources (online reviews, synopses, etc.) using a novel tree clustering approach, to represent unstructured information that correspond to implicit semantic components related to targeted slots in the user's utterances. We evaluate our models on a virtual personal assistance system and demonstrate that our interpreter is effective in that it does not only improve the utterance interpretation in spoken dialog systems (reducing the interpretation error rate by 36% relative compared to a language model baseline), but also unveils hidden semantic units that are otherwise nearly impossible to extract from purely manual lexical features that are typically used in utterance interpretation.
在自然语言人机统计对话系统中,语义解释是语义分析之后的一项关键任务,旨在提取语义组件的规范意义表示。在文献中,通常手工构建的规则用于此任务,甚至对于隐式提到的非命名语义组件(如电影类型或餐馆的价格范围)也是如此。在本研究中,我们提出了建模解释的统计方法,这也可以受益于从大量领域内知识来源中提取的语义特征。我们使用语义解析器从用户话语中提取特征,并使用新颖的树聚类方法从文本源(在线评论,概要等)中提取额外的语义特征,以表示与用户话语中与目标槽相关的隐含语义组件对应的非结构化信息。我们在虚拟个人辅助系统上评估了我们的模型,并证明我们的解释器是有效的,因为它不仅提高了口语对话系统中的话语解释(与语言模型基线相比,将解释错误率降低了36%),而且还揭示了隐藏的语义单元,否则这些语义单元几乎不可能从话语解释中通常使用的纯手动词汇特征中提取出来。
{"title":"Statistical semantic interpretation modeling for spoken language understanding with enriched semantic features","authors":"Asli Celikyilmaz, Dilek Z. Hakkani-Tür, Gökhan Tür","doi":"10.1109/SLT.2012.6424225","DOIUrl":"https://doi.org/10.1109/SLT.2012.6424225","url":null,"abstract":"In natural language human-machine statistical dialog systems, semantic interpretation is a key task typically performed following semantic parsing, and aims to extract canonical meaning representations of semantic components. In the literature, usually manually built rules are used for this task, even for implicitly mentioned non-named semantic components (like genre of a movie or price range of a restaurant). In this study, we present statistical methods for modeling interpretation, which can also benefit from semantic features extracted from large in-domain knowledge sources. We extract features from user utterances using a semantic parser and additional semantic features from textual sources (online reviews, synopses, etc.) using a novel tree clustering approach, to represent unstructured information that correspond to implicit semantic components related to targeted slots in the user's utterances. We evaluate our models on a virtual personal assistance system and demonstrate that our interpreter is effective in that it does not only improve the utterance interpretation in spoken dialog systems (reducing the interpretation error rate by 36% relative compared to a language model baseline), but also unveils hidden semantic units that are otherwise nearly impossible to extract from purely manual lexical features that are typically used in utterance interpretation.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126812238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2012 IEEE Spoken Language Technology Workshop (SLT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1