首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Unsupervised training of acoustic models for large vocabulary continuous speech recognition 大词汇量连续语音识别声学模型的无监督训练
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034648
F. Wessel, H. Ney
For speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were recorded and transcribed manually for training. Since untranscribed speech is available in various forms these days, the unsupervised training of a speech recognizer on recognized transcriptions is studied. A low-cost recognizer trained with only one hour of manually transcribed speech is used to recognize 72 hours of untranscribed acoustic data. These transcriptions are then used in combination with confidence measures to train an improved recognizer. The effect of confidence measures which are used to detect possible recognition errors is studied systematically. Finally, the unsupervised training is applied iteratively. Using this method, the recognizer is trained with very little manual effort while losing only 14.3% relative on the Broadcast News '96 and 18.6% relative on the Broadcast News '98 evaluation test sets.
对于语音识别系统来说,声学训练数据的数量是至关重要的。过去,为了训练,大量的语音都是手工录制和转录的。由于目前存在多种形式的未转录语音,因此研究了语音识别器对已识别语音的无监督训练。一个只经过一小时人工转录语音训练的低成本识别器被用来识别72小时未转录的声学数据。然后将这些转录与置信度措施结合使用来训练改进的识别器。系统地研究了用于检测可能的识别错误的置信度度量的效果。最后,迭代地应用无监督训练。使用这种方法,识别器只需要很少的人工训练,而在Broadcast News '96和Broadcast News '98评估测试集上的相对损失仅为14.3%和18.6%。
{"title":"Unsupervised training of acoustic models for large vocabulary continuous speech recognition","authors":"F. Wessel, H. Ney","doi":"10.1109/ASRU.2001.1034648","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034648","url":null,"abstract":"For speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were recorded and transcribed manually for training. Since untranscribed speech is available in various forms these days, the unsupervised training of a speech recognizer on recognized transcriptions is studied. A low-cost recognizer trained with only one hour of manually transcribed speech is used to recognize 72 hours of untranscribed acoustic data. These transcriptions are then used in combination with confidence measures to train an improved recognizer. The effect of confidence measures which are used to detect possible recognition errors is studied systematically. Finally, the unsupervised training is applied iteratively. Using this method, the recognizer is trained with very little manual effort while losing only 14.3% relative on the Broadcast News '96 and 18.6% relative on the Broadcast News '98 evaluation test sets.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115408833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Joint estimation of noise and channel distortion in a generalized EM framework 广义电磁框架下噪声和信道失真的联合估计
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034611
T. Krisjansson, B. Frey, L. Deng, A. Acero
The performance of speech cleaning and noise adaptation algorithms is heavily dependent on the quality of the noise and channel models. Various strategies have been proposed in the literature for adapting to the current noise and channel conditions. We describe the joint learning of noise and channel distortion in a novel framework called ALGONQUIN. The learning algorithm employs a generalized EM strategy wherein the E step is approximate. We discuss the characteristics of the new algorithm, with a focus on convergence rates and parameter initialization. We show that the learning algorithm can successfully disentangle the non-linear effects of noise and linear effects of the channel and achieve a relative reduction in WER of 21.8% over the non-adaptive algorithm.
语音清理和噪声自适应算法的性能很大程度上取决于噪声和信道模型的质量。为了适应当前的噪声和信道条件,文献中提出了各种策略。我们在一个叫做ALGONQUIN的新框架中描述了噪声和信道失真的联合学习。学习算法采用广义的EM策略,其中E步是近似的。我们讨论了新算法的特点,重点是收敛速度和参数初始化。我们表明,学习算法可以成功地分离噪声的非线性影响和信道的线性影响,并且与非自适应算法相比,相对降低了21.8%的WER。
{"title":"Joint estimation of noise and channel distortion in a generalized EM framework","authors":"T. Krisjansson, B. Frey, L. Deng, A. Acero","doi":"10.1109/ASRU.2001.1034611","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034611","url":null,"abstract":"The performance of speech cleaning and noise adaptation algorithms is heavily dependent on the quality of the noise and channel models. Various strategies have been proposed in the literature for adapting to the current noise and channel conditions. We describe the joint learning of noise and channel distortion in a novel framework called ALGONQUIN. The learning algorithm employs a generalized EM strategy wherein the E step is approximate. We discuss the characteristics of the new algorithm, with a focus on convergence rates and parameter initialization. We show that the learning algorithm can successfully disentangle the non-linear effects of noise and linear effects of the channel and achieve a relative reduction in WER of 21.8% over the non-adaptive algorithm.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"452 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126288159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Piecewise-linear transformation-based HMM adaptation for noisy speech 基于分段线性变换的HMM自适应噪声语音
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034612
Zhipeng Zhang, S. Furui
This paper proposes a new method using a piecewise-linear transformation for adapting phone HMM to noisy speech. Various noises are clustered according to their acoustic properties and signal-to-noise ratios (SNR), and a noisy speech HMM corresponding to each clustered noise is made. Based on the likelihood maximization criterion, the HMM which best matches the input speech is selected and further adapted using a linear transformation. The proposed method was evaluated by recognizing noisy broadcast-news speech. It was confirmed that the proposed method was effective in recognizing numerically noise-added speech and actual noisy speech by a wide range of speakers under various noise conditions.
本文提出了一种利用分段线性变换使电话HMM适应噪声语音的新方法。根据噪声的声学特性和信噪比对各种噪声进行聚类,并对每个聚类噪声建立一个噪声语音HMM。基于似然最大化准则,选择与输入语音最匹配的HMM,并通过线性变换进行自适应。通过对有噪声广播新闻语音的识别,对该方法进行了评价。实验结果表明,该方法能够有效地识别各种噪声条件下大量说话人的数字加噪语音和实际加噪语音。
{"title":"Piecewise-linear transformation-based HMM adaptation for noisy speech","authors":"Zhipeng Zhang, S. Furui","doi":"10.1109/ASRU.2001.1034612","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034612","url":null,"abstract":"This paper proposes a new method using a piecewise-linear transformation for adapting phone HMM to noisy speech. Various noises are clustered according to their acoustic properties and signal-to-noise ratios (SNR), and a noisy speech HMM corresponding to each clustered noise is made. Based on the likelihood maximization criterion, the HMM which best matches the input speech is selected and further adapted using a linear transformation. The proposed method was evaluated by recognizing noisy broadcast-news speech. It was confirmed that the proposed method was effective in recognizing numerically noise-added speech and actual noisy speech by a wide range of speakers under various noise conditions.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133129858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Estimated rank pruning and Java-based speech recognition 估计秩修剪和基于java的语音识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034669
N. Jevtic, A. Klautau, A. Orlitsky
Most speech recognition systems search through large finite state machines to find the most likely path, or hypothesis. Efficient search in these large spaces requires pruning of some hypotheses. Popular pruning techniques include probability pruning which keeps only hypotheses whose probability falls within a prescribed factor from the most likely one, and rank pruning which keeps only a prescribed number of the most probable hypotheses. Rank pruning provides better control over memory use and search complexity, but it requires sorting of the hypotheses, a time consuming task that may slow the recognition process. We propose a pruning technique which combines the advantages of probability and rank pruning. Its time complexity is similar to that of probability pruning and its search-space size, memory consumption, and recognition accuracy are comparable to those of rank pruning. We also describe a research-motivated Java-based speech recognition system that is being built at UCSD.
大多数语音识别系统通过大型有限状态机搜索最可能的路径或假设。在这些大空间中进行有效的搜索需要删减一些假设。流行的修剪技术包括概率修剪,它只保留概率在最可能的假设的规定因子范围内的假设,以及秩修剪,它只保留规定数量的最可能的假设。秩修剪可以更好地控制内存使用和搜索复杂性,但它需要对假设进行排序,这是一项耗时的任务,可能会减慢识别过程。我们提出了一种结合概率剪枝和秩剪枝优点的剪枝技术。其时间复杂度与概率剪枝相似,搜索空间大小、内存消耗和识别精度与秩剪枝相当。我们还描述了一个基于java的语音识别系统,该系统正在加州大学圣地亚哥分校建立。
{"title":"Estimated rank pruning and Java-based speech recognition","authors":"N. Jevtic, A. Klautau, A. Orlitsky","doi":"10.1109/ASRU.2001.1034669","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034669","url":null,"abstract":"Most speech recognition systems search through large finite state machines to find the most likely path, or hypothesis. Efficient search in these large spaces requires pruning of some hypotheses. Popular pruning techniques include probability pruning which keeps only hypotheses whose probability falls within a prescribed factor from the most likely one, and rank pruning which keeps only a prescribed number of the most probable hypotheses. Rank pruning provides better control over memory use and search complexity, but it requires sorting of the hypotheses, a time consuming task that may slow the recognition process. We propose a pruning technique which combines the advantages of probability and rank pruning. Its time complexity is similar to that of probability pruning and its search-space size, memory consumption, and recognition accuracy are comparable to those of rank pruning. We also describe a research-motivated Java-based speech recognition system that is being built at UCSD.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114401006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Finite-state transducers for speech-input translation 语音输入翻译的有限状态换能器
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034664
F. Casacuberta
Nowadays, hidden Markov models (HMMs) and n-grams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a n-gram or a stochastic finite-state grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finite-state transducer (the translation model). Moreover, the translation process can be performed by searching for an optimal path of states in the integrated network. The output of this search process is a target word sequence associated to the optimal path. In speech translation, HMMs can be trained from a source speech corpus, and the translation model can be learned automatically from a parallel training corpus. This approach has been assessed in the framework of the EUTRANS project, founded by the European Union. Extensive speech-input experiments have been carried out with translations from Spanish to English and from Italian to English translation, in an application involving the interaction (by telephone) of a customer with a receptionist at the front-desk of a hotel. A summary of the most relevant results are presented in this paper.
隐马尔可夫模型(hmm)和n-图是目前最成功的语音识别系统的基本组成部分。在这样的系统中,hmm(声学模型)被集成到n-gram或随机有限状态语法(语言模型)中。类似的模型可以用于语音翻译,hmm(声学模型)可以集成到有限状态换能器(翻译模型)中。此外,转换过程可以通过在集成网络中搜索状态的最优路径来完成。这个搜索过程的输出是与最优路径相关联的目标单词序列。在语音翻译中,hmm可以从源语料库中训练,翻译模型可以从并行训练语料库中自动学习。这种方法已经在欧盟建立的EUTRANS项目框架内进行了评估。在一个涉及客户与酒店前台接待员(通过电话)交互的应用程序中,进行了大量的语音输入实验,从西班牙语到英语和从意大利语到英语的翻译。本文对最相关的研究结果进行了总结。
{"title":"Finite-state transducers for speech-input translation","authors":"F. Casacuberta","doi":"10.1109/ASRU.2001.1034664","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034664","url":null,"abstract":"Nowadays, hidden Markov models (HMMs) and n-grams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a n-gram or a stochastic finite-state grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finite-state transducer (the translation model). Moreover, the translation process can be performed by searching for an optimal path of states in the integrated network. The output of this search process is a target word sequence associated to the optimal path. In speech translation, HMMs can be trained from a source speech corpus, and the translation model can be learned automatically from a parallel training corpus. This approach has been assessed in the framework of the EUTRANS project, founded by the European Union. Extensive speech-input experiments have been carried out with translations from Spanish to English and from Italian to English translation, in an application involving the interaction (by telephone) of a customer with a receptionist at the front-desk of a hotel. A summary of the most relevant results are presented in this paper.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115720114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Trend tying in the segmental-feature HMM 细分特征HMM的趋势
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034585
Young-Sun Yun
We present a reduction method for the number of parameters in a segmental-feature HMM (SFHMM). If the SFHMM shows better results than the CHMM, the number of parameters is greater than that of the CHMM. Therefore, there is a need for a new approach that reduces the number of parameters. In general, the trajectory can be separated by the trend and location. Since the trend means the variation of segmental features and occupies a large portion of the SFHMM, if the trend is shared, the number of parameters of the SFHMM may be decreased. The proposed method shares the trend part of trajectories by quantization. The experiments are performed on the TIMIT corpus to examine the effectiveness of the trend tying. The experimental results show that its performance is the almost same as that of previous studies. To obtain better results with a small amount of parameters, the various conditions for the trajectory components must be considered.
提出了一种分段特征HMM (SFHMM)中参数数目的约简方法。如果SFHMM的结果优于CHMM,则说明SFHMM的参数数量大于CHMM。因此,需要一种减少参数数量的新方法。一般来说,轨迹可以通过趋势和位置分开。由于趋势是指分段特征的变化,并且占据了SFHMM的很大一部分,如果趋势是共享的,则SFHMM的参数数量可能会减少。该方法通过量化共享轨迹的趋势部分。在TIMIT语料库上进行了实验,以检验趋势绑定的有效性。实验结果表明,其性能与以往的研究结果基本一致。为了在参数较少的情况下获得较好的结果,必须考虑弹道分量的各种条件。
{"title":"Trend tying in the segmental-feature HMM","authors":"Young-Sun Yun","doi":"10.1109/ASRU.2001.1034585","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034585","url":null,"abstract":"We present a reduction method for the number of parameters in a segmental-feature HMM (SFHMM). If the SFHMM shows better results than the CHMM, the number of parameters is greater than that of the CHMM. Therefore, there is a need for a new approach that reduces the number of parameters. In general, the trajectory can be separated by the trend and location. Since the trend means the variation of segmental features and occupies a large portion of the SFHMM, if the trend is shared, the number of parameters of the SFHMM may be decreased. The proposed method shares the trend part of trajectories by quantization. The experiments are performed on the TIMIT corpus to examine the effectiveness of the trend tying. The experimental results show that its performance is the almost same as that of previous studies. To obtain better results with a small amount of parameters, the various conditions for the trajectory components must be considered.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114223034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients mel频率倒谱系数导数的多时间分辨率
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034583
G. Stemmer, C. Hacker, E. Noth, H. Niemann
Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- and second-order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. We present an approach to improve the representation of speech dynamics, which is based on the combination of multiple time resolutions. The resulting feature vector is transformed to reduce its dimension and the correlation between the features. Another possibility, which has also been evaluated, is to use probabilistic PCA (PPCA) for the output distributions of the HMMs. Different configurations of multiple time resolutions are evaluated as well. When compared to the baseline system, a significant reduction of the word error rate can been achieved.
大多数语音识别系统都是基于mel频率倒谱系数及其一阶和二阶导数。导数通常通过将线性回归线拟合到连续帧的固定长度段来近似。估计导数的时间分辨率和平滑度取决于段的长度。提出了一种基于多时间分辨率组合的改进语音动态表示的方法。对得到的特征向量进行变换,以降低其维数和特征之间的相关性。另一种可能性,也已被评估,是使用概率PCA (PPCA)对hmm的输出分布。对多时间分辨率的不同配置也进行了评估。与基线系统相比,可以显著降低单词错误率。
{"title":"Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients","authors":"G. Stemmer, C. Hacker, E. Noth, H. Niemann","doi":"10.1109/ASRU.2001.1034583","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034583","url":null,"abstract":"Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- and second-order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. We present an approach to improve the representation of speech dynamics, which is based on the combination of multiple time resolutions. The resulting feature vector is transformed to reduce its dimension and the correlation between the features. Another possibility, which has also been evaluated, is to use probabilistic PCA (PPCA) for the output distributions of the HMMs. Different configurations of multiple time resolutions are evaluated as well. When compared to the baseline system, a significant reduction of the word error rate can been achieved.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125935493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Construction of model-space constraints 模型空间约束的构造
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034591
Patrick Nguyen, Luca Rigazio, C. Wellekens, J. Junqua
HMM systems exhibit a large amount of redundancy. To this end, a technique called eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA of the training speakers. We show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7% WER improvement (15% relative) when using self-adaptation.
HMM系统表现出大量的冗余。为此,一种被称为特征语音的技术被发现对说话人的适应非常有效。HMM参数之间的相关性通过一个称为特征空间的线性约束来利用。这个约束是通过训练演讲者的PCA得到的。我们展示了PCA如何与最大似然准则相关联。然后,我们将该方法推广到LDA变换和分段线性约束。在华尔街日报(WSJ)听写任务中,当使用自适应时,我们获得了1.7%的WER提高(相对15%)。
{"title":"Construction of model-space constraints","authors":"Patrick Nguyen, Luca Rigazio, C. Wellekens, J. Junqua","doi":"10.1109/ASRU.2001.1034591","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034591","url":null,"abstract":"HMM systems exhibit a large amount of redundancy. To this end, a technique called eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA of the training speakers. We show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7% WER improvement (15% relative) when using self-adaptation.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125155123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Histogram based normalization in the acoustic feature space 声学特征空间中基于直方图的归一化
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034579
S. Molau, Michael Pitz, H. Ney
We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when normalization is applied sequentially at different stages. We show that histogram normalization performs best if applied both in training and recognition, and that smoothing the target histogram obtained on the training data is also helpful. On the VerbMobil II corpus, a German large-vocabulary conversational speech recognition task, we achieve an overall reduction in word error rate of about 10% relative.
我们描述了一种称为直方图归一化的技术,旨在对信号分析前端不同阶段的特征空间分布进行归一化,即对数压缩滤波器组向量、倒谱系数和LDA(局部密度近似)变换后的声学向量。在滤波器组获得最佳结果,并且在大多数情况下,当在不同阶段依次应用规范化时,会有轻微的额外增益。我们表明,直方图归一化在训练和识别中都是最好的,并且平滑训练数据上得到的目标直方图也很有帮助。在vermobil II语料库(一个德语大词汇会话语音识别任务)上,我们实现了单词错误率相对降低10%左右的总体目标。
{"title":"Histogram based normalization in the acoustic feature space","authors":"S. Molau, Michael Pitz, H. Ney","doi":"10.1109/ASRU.2001.1034579","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034579","url":null,"abstract":"We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when normalization is applied sequentially at different stages. We show that histogram normalization performs best if applied both in training and recognition, and that smoothing the target histogram obtained on the training data is also helpful. On the VerbMobil II corpus, a German large-vocabulary conversational speech recognition task, we achieve an overall reduction in word error rate of about 10% relative.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123443284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
ASR in portable wireless devices 便携式无线设备中的ASR
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034597
Olli Viikki
This paper discusses the applicability and role of automatic speech recognition in portable wireless devices. Due to the author's background, the viewpoints are somewhat biased to mobile telephones, but many of the aspects are nevertheless common for other portable devices as well. While still dominated by the speaker-dependent technology, there are today signs that also in wireless devices, there are ASR trends towards speaker-independent systems. As these modern communication devices are usually intended for mass markets, the paper reviews the ASR areas that are relevant for speech recognition on low cost embedded systems. In particular, multilingual ASR, low complexity ASR algorithms and their implementation, and acoustic model adaptation techniques play a key role in enabling cost effective realization of ASR systems. Low complexity and advanced noise robust ASR algorithms are sometimes conflicting concepts. The paper also briefly reviews some of the most important noise robust ASR techniques that are well suited for embedded systems.
本文讨论了语音自动识别在便携式无线设备中的适用性和作用。由于作者的背景,这些观点有些偏向于移动电话,但许多方面对其他便携式设备也是常见的。虽然依赖扬声器的技术仍然占主导地位,但今天有迹象表明,在无线设备中,也有面向独立扬声器系统的ASR趋势。由于这些现代通信设备通常用于大众市场,因此本文综述了与低成本嵌入式系统语音识别相关的ASR领域。特别是,多语言ASR、低复杂度ASR算法及其实现以及声学模型自适应技术在实现ASR系统的成本效益方面发挥了关键作用。低复杂度和高级噪声鲁棒ASR算法有时是相互冲突的概念。本文还简要回顾了一些最重要的噪声鲁棒ASR技术,这些技术非常适合嵌入式系统。
{"title":"ASR in portable wireless devices","authors":"Olli Viikki","doi":"10.1109/ASRU.2001.1034597","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034597","url":null,"abstract":"This paper discusses the applicability and role of automatic speech recognition in portable wireless devices. Due to the author's background, the viewpoints are somewhat biased to mobile telephones, but many of the aspects are nevertheless common for other portable devices as well. While still dominated by the speaker-dependent technology, there are today signs that also in wireless devices, there are ASR trends towards speaker-independent systems. As these modern communication devices are usually intended for mass markets, the paper reviews the ASR areas that are relevant for speech recognition on low cost embedded systems. In particular, multilingual ASR, low complexity ASR algorithms and their implementation, and acoustic model adaptation techniques play a key role in enabling cost effective realization of ASR systems. Low complexity and advanced noise robust ASR algorithms are sometimes conflicting concepts. The paper also briefly reviews some of the most important noise robust ASR techniques that are well suited for embedded systems.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1