首页 > 最新文献

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

英文 中文
Dynamic language modeling for a daily broadcast news transcription system 每日广播新闻转录系统的动态语言建模
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430103
Ciro Martins, A. Teixeira, J. Neto
When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.
在用高屈折语转录广播新闻数据时,词汇量的增长导致了高词汇外率。为了解决这个问题,我们提出了一种每日无监督自适应方法,该方法在多通道语音识别过程中动态地使活动词汇和LM适应当前新闻片段的主题。基于Web上每天可用的文本,使用形态-句法技术选择基于故事的词汇表。使用信息检索引擎,从大型语料库中提取相关文档,生成基于故事的LM。实验进行了欧洲葡萄牙语BN转录系统。初步结果显示,OOV相对降低65.2%,WER相对降低6.6%。
{"title":"Dynamic language modeling for a daily broadcast news transcription system","authors":"Ciro Martins, A. Teixeira, J. Neto","doi":"10.1109/ASRU.2007.4430103","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430103","url":null,"abstract":"When transcribing Broadcast News data in highly inflected languages, the vocabulary growth leads to high out-of-vocabulary rates. To address this problem, we propose a daily and unsupervised adaptation approach which dynamically adapts the active vocabulary and LM to the topic of the current news segment during a multi-pass speech recognition process. Based on texts daily available on the Web, a story-based vocabulary is selected using a morpho-syntatic technique. Using an Information Retrieval engine, relevant documents are extracted from a large corpus to generate a story-based LM. Experiments were carried out for a European Portuguese BN transcription system. Preliminary results yield a relative reduction of 65.2% in OOV and 6.6% in WER.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"386 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129789070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
A compact semidefinite programming (SDP) formulation for large margin estimation of HMMS in speech recognition 一种用于语音识别中hmm大余量估计的紧凑半定规划(SDP)公式
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430130
Yan Yin, Hui Jiang
In this paper, we study a new semidefinite programming (SDP) formulation to improve optimization efficiency for large margin estimation (LME) of HMMs in speech recognition. We re-formulate the same LME problem as smaller-scale SDP problems to speed up the SDP-based LME training, especially for large model sets. In the new formulation, instead of building the SDP problem from a single huge variable matrix, we consider to formulate the SDP problem based on many small independent variable matrices, each of which is built separately from a Gaussian mean vector. Moreover, we propose to further decompose feature vectors and Gaussian mean vectors according to static, delta and accelerate components to build even more compact variable matrices. This method can significantly reduce the total number of free variables and result in much smaller SDP problem even for the same model set. The proposed new LME/SDP methods have been evaluated on a connected digit string recognition task using the TIDIGITS database. Experimental results show that it can significantly improve optimization efficiency (about 30-50 times faster for large model sets) and meanwhile it can provide slightly better optimization accuracy and recognition performance than our previous SDP formulation.
为了提高语音识别中hmm的大余量估计(LME)的优化效率,研究了一种新的半定规划(SDP)公式。我们将相同的LME问题重新表述为较小规模的SDP问题,以加快基于SDP的LME训练,特别是对于大型模型集。在新的公式中,我们不再从单个大变量矩阵中构建SDP问题,而是考虑基于许多小的自变量矩阵来构建SDP问题,每个小的自变量矩阵都是由一个高斯平均向量单独构建的。此外,我们建议根据静态、增量和加速分量进一步分解特征向量和高斯均值向量,以构建更紧凑的变量矩阵。该方法可以显著减少自由变量的总数,即使对于相同的模型集,也可以使SDP问题大大减小。在使用TIDIGITS数据库的连接数字字符串识别任务上,对所提出的新的LME/SDP方法进行了评估。实验结果表明,该方法可以显著提高优化效率(对于大型模型集的优化速度约为30-50倍),同时可以提供比我们之前的SDP公式稍好的优化精度和识别性能。
{"title":"A compact semidefinite programming (SDP) formulation for large margin estimation of HMMS in speech recognition","authors":"Yan Yin, Hui Jiang","doi":"10.1109/ASRU.2007.4430130","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430130","url":null,"abstract":"In this paper, we study a new semidefinite programming (SDP) formulation to improve optimization efficiency for large margin estimation (LME) of HMMs in speech recognition. We re-formulate the same LME problem as smaller-scale SDP problems to speed up the SDP-based LME training, especially for large model sets. In the new formulation, instead of building the SDP problem from a single huge variable matrix, we consider to formulate the SDP problem based on many small independent variable matrices, each of which is built separately from a Gaussian mean vector. Moreover, we propose to further decompose feature vectors and Gaussian mean vectors according to static, delta and accelerate components to build even more compact variable matrices. This method can significantly reduce the total number of free variables and result in much smaller SDP problem even for the same model set. The proposed new LME/SDP methods have been evaluated on a connected digit string recognition task using the TIDIGITS database. Experimental results show that it can significantly improve optimization efficiency (about 30-50 times faster for large model sets) and meanwhile it can provide slightly better optimization accuracy and recognition performance than our previous SDP formulation.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129890612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Extensible speech recognition system using proxy-agent 使用代理-代理的可扩展语音识别系统
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430181
Teppei Nakano, S. Fujie, Tetsunori Kobayashi
This paper presents an extension framework for a speech recognition system. This framework is designed to use "proxy-agent," a software component located between applications, speech recognition engines, and input devices. By taking advantage of its structural characteristics, proxy-agent can provide supplementary services for speech recognition systems as well as user extensions. A monitoring capability, a feedback capability, and an extension capability are implemented and presented in this paper. For the first prototype, we developed a data collection application and an application control system using proxy-agent. Through these developments, we verified the effectiveness of the data collection capability of proxy-agent, and the framework extension capability.
本文提出了一个语音识别系统的扩展框架。该框架被设计为使用“代理-代理”,这是一种位于应用程序、语音识别引擎和输入设备之间的软件组件。通过利用其结构特点,代理-代理可以为语音识别系统提供补充服务以及用户扩展。本文实现并介绍了监视功能、反馈功能和扩展功能。对于第一个原型,我们使用代理-代理开发了一个数据收集应用程序和一个应用程序控制系统。通过这些开发,验证了代理-代理数据收集能力和框架扩展能力的有效性。
{"title":"Extensible speech recognition system using proxy-agent","authors":"Teppei Nakano, S. Fujie, Tetsunori Kobayashi","doi":"10.1109/ASRU.2007.4430181","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430181","url":null,"abstract":"This paper presents an extension framework for a speech recognition system. This framework is designed to use \"proxy-agent,\" a software component located between applications, speech recognition engines, and input devices. By taking advantage of its structural characteristics, proxy-agent can provide supplementary services for speech recognition systems as well as user extensions. A monitoring capability, a feedback capability, and an extension capability are implemented and presented in this paper. For the first prototype, we developed a data collection application and an application control system using proxy-agent. Through these developments, we verified the effectiveness of the data collection capability of proxy-agent, and the framework extension capability.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128882319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic vocabulary prediction for isolated-word dictation on embedded devices 嵌入式设备上孤立词听写的动态词汇预测
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430172
Jussi Leppänen, Jilei Tian
Large-vocabulary speech recognition systems have mainly been developed for fast processors and large amounts of memory that are available on desktop computers and network servers. Much progress has been made towards running these systems on portable devices. Challenges still exist, however, when developing highly efficient algorithms for real-time speech recognition on resource-limited embedded platforms. In this paper, a dynamic vocabulary prediction approach is proposed to decrease the memory footprint of the speech recognizer decoder by keeping the decoder vocabulary small. This leads to reduced acoustic confusion as well as achieving very efficient use of computational resources. Experiments on an isolated-word SMS dictation task have shown that 40% of the vocabulary prediction errors can be eliminated compared to the baseline system.
大词汇量语音识别系统主要是为桌面计算机和网络服务器上的快速处理器和大内存而开发的。在便携式设备上运行这些系统方面已经取得了很大进展。然而,在资源有限的嵌入式平台上开发高效的实时语音识别算法仍然存在挑战。本文提出了一种动态词汇预测方法,通过保持解码器词汇量较小来减少语音识别器解码器的内存占用。这可以减少声音混淆,并实现非常有效地利用计算资源。在一个孤立单词的短信听写任务上的实验表明,与基线系统相比,该系统可以消除40%的词汇预测错误。
{"title":"Dynamic vocabulary prediction for isolated-word dictation on embedded devices","authors":"Jussi Leppänen, Jilei Tian","doi":"10.1109/ASRU.2007.4430172","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430172","url":null,"abstract":"Large-vocabulary speech recognition systems have mainly been developed for fast processors and large amounts of memory that are available on desktop computers and network servers. Much progress has been made towards running these systems on portable devices. Challenges still exist, however, when developing highly efficient algorithms for real-time speech recognition on resource-limited embedded platforms. In this paper, a dynamic vocabulary prediction approach is proposed to decrease the memory footprint of the speech recognizer decoder by keeping the decoder vocabulary small. This leads to reduced acoustic confusion as well as achieving very efficient use of computational resources. Experiments on an isolated-word SMS dictation task have shown that 40% of the vocabulary prediction errors can be eliminated compared to the baseline system.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Random discriminant structure analysis for automatic recognition of connected vowels 连接元音自动识别的随机判别结构分析
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430176
Y. Qiao, S. Asakawa, N. Minematsu
The universal structure of speech [1, 2], proves to be invariant to transformations in feature space, and thus provides a robust representation for speech recognition. One of the difficulties of using structure representation is due to its high dimensionality. This not only increases computational cost but also easily suffers from the curse of dimensionality [3, 4]. In this paper, we introduce random discriminant structure analysis (RDSA) to deal with this problem. Based on the observation that structural features are highly correlated and include large redundancy, the RDSA combines random feature selection and discriminative analysis to calculate several low dimensional and discriminative representations from an input structure. Then an individual classifier is trained for each representation and the outputs of each classifier are integrated for the final classification decision. Experimental results on connected Japanese vowel utterances show that our approach achieves a recognition rate of 98.3% based on the training data of 8 speakers, which is higher than that (97.4%) of HMMs trained with the utterances of 4,130 speakers.
语音的通用结构[1,2]被证明对特征空间的变换是不变的,从而为语音识别提供了一个鲁棒的表示。使用结构表示的困难之一是由于它的高维性。这不仅增加了计算成本,而且容易遭受维数诅咒[3,4]。本文引入随机判别结构分析(RDSA)来解决这一问题。基于对结构特征高度相关和冗余的观察,RDSA将随机特征选择和判别分析相结合,从输入结构中计算出多个低维和判别表示。然后为每个表示训练一个单独的分类器,并将每个分类器的输出集成以做出最终的分类决策。对日语元音连音的实验结果表明,基于8个说话人的训练数据,我们的方法达到了98.3%的识别率,高于4130个说话人训练的hmm识别率(97.4%)。
{"title":"Random discriminant structure analysis for automatic recognition of connected vowels","authors":"Y. Qiao, S. Asakawa, N. Minematsu","doi":"10.1109/ASRU.2007.4430176","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430176","url":null,"abstract":"The universal structure of speech [1, 2], proves to be invariant to transformations in feature space, and thus provides a robust representation for speech recognition. One of the difficulties of using structure representation is due to its high dimensionality. This not only increases computational cost but also easily suffers from the curse of dimensionality [3, 4]. In this paper, we introduce random discriminant structure analysis (RDSA) to deal with this problem. Based on the observation that structural features are highly correlated and include large redundancy, the RDSA combines random feature selection and discriminative analysis to calculate several low dimensional and discriminative representations from an input structure. Then an individual classifier is trained for each representation and the outputs of each classifier are integrated for the final classification decision. Experimental results on connected Japanese vowel utterances show that our approach achieves a recognition rate of 98.3% based on the training data of 8 speakers, which is higher than that (97.4%) of HMMs trained with the utterances of 4,130 speakers.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Hierarchical large-margin Gaussian mixture models for phonetic classification 语音分类的分层大边际高斯混合模型
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430123
Hung-An Chang, James R. Glass
In this paper we present a hierarchical large-margin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A two-stage hierarchical classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overall classification. Since the loss function required in the training is convex to the parameter space the problem of spurious local minima is avoided. The model achieves good performance with fewer parameters than single-level classifiers. In the TIMIT benchmark task of context-independent phonetic classification, the proposed modeling scheme achieves a state-of-the-art phonetic classification error of 16.7% on the core test set. This is an absolute reduction of 1.6% from the best previously reported result on this task, and 4-5% lower than a variety of classifiers that have been recently examined on this task.
本文提出了一种分层大余量高斯混合建模框架,并在语音分类任务上对其进行了评价。通过交替更新树中不同层次的参数来训练两阶段的分层分类器,以最大化整体分类的联合裕度。由于训练所需的损失函数对参数空间是凸的,因此避免了伪局部极小值的问题。与单级分类器相比,该模型在参数较少的情况下取得了较好的性能。在上下文无关语音分类的TIMIT基准任务中,提出的建模方案在核心测试集上实现了最先进的语音分类误差为16.7%。这比之前报告的最佳结果减少了1.6%,比最近在该任务上测试的各种分类器低4-5%。
{"title":"Hierarchical large-margin Gaussian mixture models for phonetic classification","authors":"Hung-An Chang, James R. Glass","doi":"10.1109/ASRU.2007.4430123","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430123","url":null,"abstract":"In this paper we present a hierarchical large-margin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A two-stage hierarchical classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overall classification. Since the loss function required in the training is convex to the parameter space the problem of spurious local minima is avoided. The model achieves good performance with fewer parameters than single-level classifiers. In the TIMIT benchmark task of context-independent phonetic classification, the proposed modeling scheme achieves a state-of-the-art phonetic classification error of 16.7% on the core test set. This is an absolute reduction of 1.6% from the best previously reported result on this task, and 4-5% lower than a variety of classifiers that have been recently examined on this task.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115220513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Spoken language understanding with kernels for syntactic/semantic structures 口语理解与语法/语义结构的核心
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430106
Alessandro Moschitti, G. Riccardi, C. Raymond
Automatic concept segmentation and labeling are the fundamental problems of spoken language understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with support vector machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with off-the-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-the-art models when combined with n-gram based models.
概念的自动分割和标注是对话系统中口语理解的基本问题。这些任务通常使用基于n-grams的生成或判别模型来处理。随着对话系统语音输入的不确定性或模糊性的增加,我们期望需要n-gram统计之外的依赖关系。为了提高句子切分和概念标注的准确性,本文提出了一种通用的统计句法解析器来检测概念之间的句法/语义依赖关系。该方法的主要新颖之处在于在判别学习模型中使用了新的树核函数来编码语法/语义结构。我们在标准ATIS数据集上用支持向量机和上述核进行了实验。该算法使用现成的统计解析器对自然语言文本进行自动解析,并用概念标签对语法(子)树进行标记。结果表明,当与基于n-gram的模型相结合时,所提出的模型非常准确,并且与最先进的模型相比具有竞争力。
{"title":"Spoken language understanding with kernels for syntactic/semantic structures","authors":"Alessandro Moschitti, G. Riccardi, C. Raymond","doi":"10.1109/ASRU.2007.4430106","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430106","url":null,"abstract":"Automatic concept segmentation and labeling are the fundamental problems of spoken language understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with support vector machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with off-the-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-the-art models when combined with n-gram based models.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121368758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification 正则化、自适应和非独立特征改进了电话分类中的隐藏条件随机场
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430136
Yun-Hsuan Sung, Constantinos Boulis, Christopher D. Manning, Dan Jurafsky
We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply Maximum a Posteriori adaptation to HCRFs, decreasing the phone classification error rate in the Switchboard corpus by around 1% -5% given only small amounts of adaptation data.
我们展示了在TIMIT和交换机语料库上使用隐藏条件随机场(HCRFs)进行电话分类的一些改进。我们首先表明,正则化的使用有效地防止了过拟合,优于其他方法,如早期停止。然后,我们表明hcrf能够在手机分类中使用非独立特征,至少在少量混合成分的情况下,而hmm由于其强独立性假设而降级。最后,我们成功地将最大后验自适应应用于HCRFs,在仅使用少量自适应数据的情况下,将总机语料库中的电话分类错误率降低了约1% -5%。
{"title":"Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification","authors":"Yun-Hsuan Sung, Constantinos Boulis, Christopher D. Manning, Dan Jurafsky","doi":"10.1109/ASRU.2007.4430136","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430136","url":null,"abstract":"We show a number of improvements in the use of Hidden Conditional Random Fields (HCRFs) for phone classification on the TIMIT and Switchboard corpora. We first show that the use of regularization effectively prevents overfitting, improving over other methods such as early stopping. We then show that HCRFs are able to make use of non-independent features in phone classification, at least with small numbers of mixture components, while HMMs degrade due to their strong independence assumptions. Finally, we successfully apply Maximum a Posteriori adaptation to HCRFs, decreasing the phone classification error rate in the Switchboard corpus by around 1% -5% given only small amounts of adaptation data.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121569977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Crosslingual acoustic model development for automatics speech recognition 自动语音识别的跨语言声学模型开发
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430150
Frank Diehl, A. Moreno, E. Monte‐Moreno
In this work we discuss the development of two cross-lingual acoustic model sets for automatic speech recognition (ASR). The starting point is a set of multilingual Spanish-English-German hidden Markov models (HMMs). The target languages are Slovenian and French. During the discussion the problem of defining a multilingual phoneme set and the associated dictionary mapping is considered. A method is described to circumvent related problems. The impact of the acoustic source models on the performance of the target systems is analyzed in detail. Several cross-lingual defined target systems are built and compared to their monolingual counterparts. It is shown that cross-lingual build acoustic models clearly outperform pure monolingual models if only a limited amount of target data is available.
在这项工作中,我们讨论了自动语音识别(ASR)的两个跨语言声学模型集的发展。起点是一组多语言西班牙-英语-德语隐马尔可夫模型(hmm)。目标语言是斯洛文尼亚语和法语。在讨论过程中,考虑了多语言音素集的定义问题和相关的字典映射问题。本文描述了一种规避相关问题的方法。详细分析了声源模型对目标系统性能的影响。构建了几个跨语言定义的目标系统,并将其与单语言对应系统进行了比较。研究表明,在目标数据有限的情况下,跨语言构建声学模型明显优于纯单语言模型。
{"title":"Crosslingual acoustic model development for automatics speech recognition","authors":"Frank Diehl, A. Moreno, E. Monte‐Moreno","doi":"10.1109/ASRU.2007.4430150","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430150","url":null,"abstract":"In this work we discuss the development of two cross-lingual acoustic model sets for automatic speech recognition (ASR). The starting point is a set of multilingual Spanish-English-German hidden Markov models (HMMs). The target languages are Slovenian and French. During the discussion the problem of defining a multilingual phoneme set and the associated dictionary mapping is considered. A method is described to circumvent related problems. The impact of the acoustic source models on the performance of the target systems is analyzed in detail. Several cross-lingual defined target systems are built and compared to their monolingual counterparts. It is shown that cross-lingual build acoustic models clearly outperform pure monolingual models if only a limited amount of target data is available.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132692773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An algorithm for fast composition of weighted finite-state transducers 加权有限状态传感器的快速合成算法
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430156
J. McDonough, Emilian Stoimenov, D. Klakow
In automatic speech recognition based on weighted-finite transducers, a static decoding graph HC o L o G is typically constructed. In this work, we first show how the size of the decoding graph can be reduced and the necessity of determinizing it can be eliminated by removing the ambiguity associated with transitions to the backoff state or states in G. We then show how the static construction can be avoided entirely by performing fast on-the-fly composition of HC and L o G. We demonstrate that speech recognition based on this on-the-fly composition approximately 80% more run-time than recognition based on the statically-expanded network R, which makes it competitive compared with other dynamic expansion algorithms that have appeared in the literature. Moreover, the dynamic algorithm requires a factor of approximately seven less main memory as the recognition based on the static decoding graph.
在基于加权有限换能器的自动语音识别中,通常构造静态解码图HC ~ L ~ G。在这项工作中,首先展示如何解码图像的大小可以减少和determinizing可以被删除的必要性与过渡到相关的模糊补偿国家或州g .然后我们展示静态结构可以避免完全由执行快速动态组成HC和L o g .我们证明了基于语音识别的动态组成大约80%比识别基于statically-expanded网络运行时R,这使得它与文献中出现的其他动态展开算法相比具有竞争力。此外,基于静态解码图的识别,动态算法所需的主存储器大约减少了7倍。
{"title":"An algorithm for fast composition of weighted finite-state transducers","authors":"J. McDonough, Emilian Stoimenov, D. Klakow","doi":"10.1109/ASRU.2007.4430156","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430156","url":null,"abstract":"In automatic speech recognition based on weighted-finite transducers, a static decoding graph HC o L o G is typically constructed. In this work, we first show how the size of the decoding graph can be reduced and the necessity of determinizing it can be eliminated by removing the ambiguity associated with transitions to the backoff state or states in G. We then show how the static construction can be avoided entirely by performing fast on-the-fly composition of HC and L o G. We demonstrate that speech recognition based on this on-the-fly composition approximately 80% more run-time than recognition based on the statically-expanded network R, which makes it competitive compared with other dynamic expansion algorithms that have appeared in the literature. Moreover, the dynamic algorithm requires a factor of approximately seven less main memory as the recognition based on the static decoding graph.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134065202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1