首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Automatic evaluation methods of a speech translation system's capability 语音翻译系统能力的自动评价方法
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034661
F. Sugaya, K. Yasuda, T. Takezawa, S. Yamamoto
The main goal of the paper is to propose automatic schemes for the translation paired comparison method, which was proposed by the authors to evaluate precisely a speech translation system's capability. In the method, the outputs of the speech translation system are subjectively compared with the results of native Japanese taking the Test of English for International Communication (TOEIC), which is used as a measure of a person's speech translation capability. Experiments are conducted on TDMT, which is a subsystem of the Japanese-to-English speech translation system ATR-MATRIX developed at ATR Interpreting Telecommunications Research Laboratories. The winning rate of TDMT shows a good correlation with the TOEIC scores of the examinees. A regression analysis on the subjective results shows that the translation capability of TDMT matches a person scoring around 700 on the TOEIC. The automatic evaluation methods use DP-based similarity, which is calculated by DP distances between a translation output and multiple translation answers. The answers are collected by two methods: paraphrasing and query from a parallel corpus. In both types of collection, the similarity shows the same good correlation with the TOEIC scores of the examinees as the subjective winning rate. Regression analysis using similarity shows that the system's matched point is around 750. We also show effects of paraphrased data.
本文的主要目标是提出翻译配对比对方法的自动方案,该方法可以精确地评估语音翻译系统的能力。在该方法中,将语音翻译系统的输出与日本人参加国际交流英语考试(TOEIC)的结果进行主观上的比较,以衡量一个人的语音翻译能力。TDMT是ATR口译电信研究实验室开发的日英语音翻译系统ATR- matrix的一个子系统。TDMT的中奖率与考生的托业成绩有很好的相关性。对主观结果的回归分析表明,TDMT的翻译能力与托业700分左右的人相当。自动评价方法使用基于DP的相似度,通过翻译输出和多个翻译答案之间的DP距离计算。通过两种方法收集答案:从平行语料库中改写和查询。在这两种类型的集合中,相似性与考生的托业成绩表现出与主观中标率同样良好的相关性。使用相似度进行回归分析,系统的匹配点在750左右。我们还展示了改写数据的效果。
{"title":"Automatic evaluation methods of a speech translation system's capability","authors":"F. Sugaya, K. Yasuda, T. Takezawa, S. Yamamoto","doi":"10.1109/ASRU.2001.1034661","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034661","url":null,"abstract":"The main goal of the paper is to propose automatic schemes for the translation paired comparison method, which was proposed by the authors to evaluate precisely a speech translation system's capability. In the method, the outputs of the speech translation system are subjectively compared with the results of native Japanese taking the Test of English for International Communication (TOEIC), which is used as a measure of a person's speech translation capability. Experiments are conducted on TDMT, which is a subsystem of the Japanese-to-English speech translation system ATR-MATRIX developed at ATR Interpreting Telecommunications Research Laboratories. The winning rate of TDMT shows a good correlation with the TOEIC scores of the examinees. A regression analysis on the subjective results shows that the translation capability of TDMT matches a person scoring around 700 on the TOEIC. The automatic evaluation methods use DP-based similarity, which is calculated by DP distances between a translation output and multiple translation answers. The answers are collected by two methods: paraphrasing and query from a parallel corpus. In both types of collection, the similarity shows the same good correlation with the TOEIC scores of the examinees as the subjective winning rate. Regression analysis using similarity shows that the system's matched point is around 750. We also show effects of paraphrased data.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134325812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vocabulary independent speech recognition using particles 词汇独立语音识别使用粒子
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034650
E. Whittaker, J.M. Van Thong, P. Moreno
A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best hypothesis from the particle speech recognizer is expanded into a list of phonetically similar word candidates using a phone confusion matrix. The resulting word graph is then re-decoded using a word language model to produce the final word hypothesis. Preliminary results on the DARPA HUB4 97 and 98 evaluation sets using word bigram redecoding of the particle hypothesis show a WER of between 2.2% and 2.9% higher than using a word bigram speech recognizer of comparable complexity. The method has potential applications in spoken document retrieval for recovering out-of-vocabulary words and also in client-server based speech recognition.
提出了一种不依赖于固定词汇表的语音识别方法。在语音识别系统中,粒子作为识别单元,实现了独立于单词词汇的语音解码。一个粒子代表一个连接的电话序列。在粒子语音识别器的单最佳假设中,代表单词的每个粒子串都使用电话混淆矩阵扩展为语音相似的候选单词列表。然后使用单词语言模型对生成的单词图进行重新解码,从而产生最终的单词假设。在DARPA hub497和98评估集上,使用粒子假设的词重图重新解码的初步结果表明,与使用相同复杂性的词重图语音识别器相比,WER高出2.2%至2.9%。该方法在口语文档检索和基于客户端-服务器的语音识别中具有潜在的应用前景。
{"title":"Vocabulary independent speech recognition using particles","authors":"E. Whittaker, J.M. Van Thong, P. Moreno","doi":"10.1109/ASRU.2001.1034650","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034650","url":null,"abstract":"A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best hypothesis from the particle speech recognizer is expanded into a list of phonetically similar word candidates using a phone confusion matrix. The resulting word graph is then re-decoded using a word language model to produce the final word hypothesis. Preliminary results on the DARPA HUB4 97 and 98 evaluation sets using word bigram redecoding of the particle hypothesis show a WER of between 2.2% and 2.9% higher than using a word bigram speech recognizer of comparable complexity. The method has potential applications in spoken document retrieval for recovering out-of-vocabulary words and also in client-server based speech recognition.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"232 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134326254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Piecewise-linear transformation-based HMM adaptation for noisy speech 基于分段线性变换的HMM自适应噪声语音
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034612
Zhipeng Zhang, S. Furui
This paper proposes a new method using a piecewise-linear transformation for adapting phone HMM to noisy speech. Various noises are clustered according to their acoustic properties and signal-to-noise ratios (SNR), and a noisy speech HMM corresponding to each clustered noise is made. Based on the likelihood maximization criterion, the HMM which best matches the input speech is selected and further adapted using a linear transformation. The proposed method was evaluated by recognizing noisy broadcast-news speech. It was confirmed that the proposed method was effective in recognizing numerically noise-added speech and actual noisy speech by a wide range of speakers under various noise conditions.
本文提出了一种利用分段线性变换使电话HMM适应噪声语音的新方法。根据噪声的声学特性和信噪比对各种噪声进行聚类,并对每个聚类噪声建立一个噪声语音HMM。基于似然最大化准则,选择与输入语音最匹配的HMM,并通过线性变换进行自适应。通过对有噪声广播新闻语音的识别,对该方法进行了评价。实验结果表明,该方法能够有效地识别各种噪声条件下大量说话人的数字加噪语音和实际加噪语音。
{"title":"Piecewise-linear transformation-based HMM adaptation for noisy speech","authors":"Zhipeng Zhang, S. Furui","doi":"10.1109/ASRU.2001.1034612","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034612","url":null,"abstract":"This paper proposes a new method using a piecewise-linear transformation for adapting phone HMM to noisy speech. Various noises are clustered according to their acoustic properties and signal-to-noise ratios (SNR), and a noisy speech HMM corresponding to each clustered noise is made. Based on the likelihood maximization criterion, the HMM which best matches the input speech is selected and further adapted using a linear transformation. The proposed method was evaluated by recognizing noisy broadcast-news speech. It was confirmed that the proposed method was effective in recognizing numerically noise-added speech and actual noisy speech by a wide range of speakers under various noise conditions.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133129858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Estimated rank pruning and Java-based speech recognition 估计秩修剪和基于java的语音识别
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034669
N. Jevtic, A. Klautau, A. Orlitsky
Most speech recognition systems search through large finite state machines to find the most likely path, or hypothesis. Efficient search in these large spaces requires pruning of some hypotheses. Popular pruning techniques include probability pruning which keeps only hypotheses whose probability falls within a prescribed factor from the most likely one, and rank pruning which keeps only a prescribed number of the most probable hypotheses. Rank pruning provides better control over memory use and search complexity, but it requires sorting of the hypotheses, a time consuming task that may slow the recognition process. We propose a pruning technique which combines the advantages of probability and rank pruning. Its time complexity is similar to that of probability pruning and its search-space size, memory consumption, and recognition accuracy are comparable to those of rank pruning. We also describe a research-motivated Java-based speech recognition system that is being built at UCSD.
大多数语音识别系统通过大型有限状态机搜索最可能的路径或假设。在这些大空间中进行有效的搜索需要删减一些假设。流行的修剪技术包括概率修剪,它只保留概率在最可能的假设的规定因子范围内的假设,以及秩修剪,它只保留规定数量的最可能的假设。秩修剪可以更好地控制内存使用和搜索复杂性,但它需要对假设进行排序,这是一项耗时的任务,可能会减慢识别过程。我们提出了一种结合概率剪枝和秩剪枝优点的剪枝技术。其时间复杂度与概率剪枝相似,搜索空间大小、内存消耗和识别精度与秩剪枝相当。我们还描述了一个基于java的语音识别系统,该系统正在加州大学圣地亚哥分校建立。
{"title":"Estimated rank pruning and Java-based speech recognition","authors":"N. Jevtic, A. Klautau, A. Orlitsky","doi":"10.1109/ASRU.2001.1034669","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034669","url":null,"abstract":"Most speech recognition systems search through large finite state machines to find the most likely path, or hypothesis. Efficient search in these large spaces requires pruning of some hypotheses. Popular pruning techniques include probability pruning which keeps only hypotheses whose probability falls within a prescribed factor from the most likely one, and rank pruning which keeps only a prescribed number of the most probable hypotheses. Rank pruning provides better control over memory use and search complexity, but it requires sorting of the hypotheses, a time consuming task that may slow the recognition process. We propose a pruning technique which combines the advantages of probability and rank pruning. Its time complexity is similar to that of probability pruning and its search-space size, memory consumption, and recognition accuracy are comparable to those of rank pruning. We also describe a research-motivated Java-based speech recognition system that is being built at UCSD.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114401006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Finite-state transducers for speech-input translation 语音输入翻译的有限状态换能器
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034664
F. Casacuberta
Nowadays, hidden Markov models (HMMs) and n-grams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a n-gram or a stochastic finite-state grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finite-state transducer (the translation model). Moreover, the translation process can be performed by searching for an optimal path of states in the integrated network. The output of this search process is a target word sequence associated to the optimal path. In speech translation, HMMs can be trained from a source speech corpus, and the translation model can be learned automatically from a parallel training corpus. This approach has been assessed in the framework of the EUTRANS project, founded by the European Union. Extensive speech-input experiments have been carried out with translations from Spanish to English and from Italian to English translation, in an application involving the interaction (by telephone) of a customer with a receptionist at the front-desk of a hotel. A summary of the most relevant results are presented in this paper.
隐马尔可夫模型(hmm)和n-图是目前最成功的语音识别系统的基本组成部分。在这样的系统中,hmm(声学模型)被集成到n-gram或随机有限状态语法(语言模型)中。类似的模型可以用于语音翻译,hmm(声学模型)可以集成到有限状态换能器(翻译模型)中。此外,转换过程可以通过在集成网络中搜索状态的最优路径来完成。这个搜索过程的输出是与最优路径相关联的目标单词序列。在语音翻译中,hmm可以从源语料库中训练,翻译模型可以从并行训练语料库中自动学习。这种方法已经在欧盟建立的EUTRANS项目框架内进行了评估。在一个涉及客户与酒店前台接待员(通过电话)交互的应用程序中,进行了大量的语音输入实验,从西班牙语到英语和从意大利语到英语的翻译。本文对最相关的研究结果进行了总结。
{"title":"Finite-state transducers for speech-input translation","authors":"F. Casacuberta","doi":"10.1109/ASRU.2001.1034664","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034664","url":null,"abstract":"Nowadays, hidden Markov models (HMMs) and n-grams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a n-gram or a stochastic finite-state grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finite-state transducer (the translation model). Moreover, the translation process can be performed by searching for an optimal path of states in the integrated network. The output of this search process is a target word sequence associated to the optimal path. In speech translation, HMMs can be trained from a source speech corpus, and the translation model can be learned automatically from a parallel training corpus. This approach has been assessed in the framework of the EUTRANS project, founded by the European Union. Extensive speech-input experiments have been carried out with translations from Spanish to English and from Italian to English translation, in an application involving the interaction (by telephone) of a customer with a receptionist at the front-desk of a hotel. A summary of the most relevant results are presented in this paper.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115720114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Trend tying in the segmental-feature HMM 细分特征HMM的趋势
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034585
Young-Sun Yun
We present a reduction method for the number of parameters in a segmental-feature HMM (SFHMM). If the SFHMM shows better results than the CHMM, the number of parameters is greater than that of the CHMM. Therefore, there is a need for a new approach that reduces the number of parameters. In general, the trajectory can be separated by the trend and location. Since the trend means the variation of segmental features and occupies a large portion of the SFHMM, if the trend is shared, the number of parameters of the SFHMM may be decreased. The proposed method shares the trend part of trajectories by quantization. The experiments are performed on the TIMIT corpus to examine the effectiveness of the trend tying. The experimental results show that its performance is the almost same as that of previous studies. To obtain better results with a small amount of parameters, the various conditions for the trajectory components must be considered.
提出了一种分段特征HMM (SFHMM)中参数数目的约简方法。如果SFHMM的结果优于CHMM,则说明SFHMM的参数数量大于CHMM。因此,需要一种减少参数数量的新方法。一般来说,轨迹可以通过趋势和位置分开。由于趋势是指分段特征的变化,并且占据了SFHMM的很大一部分,如果趋势是共享的,则SFHMM的参数数量可能会减少。该方法通过量化共享轨迹的趋势部分。在TIMIT语料库上进行了实验,以检验趋势绑定的有效性。实验结果表明,其性能与以往的研究结果基本一致。为了在参数较少的情况下获得较好的结果,必须考虑弹道分量的各种条件。
{"title":"Trend tying in the segmental-feature HMM","authors":"Young-Sun Yun","doi":"10.1109/ASRU.2001.1034585","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034585","url":null,"abstract":"We present a reduction method for the number of parameters in a segmental-feature HMM (SFHMM). If the SFHMM shows better results than the CHMM, the number of parameters is greater than that of the CHMM. Therefore, there is a need for a new approach that reduces the number of parameters. In general, the trajectory can be separated by the trend and location. Since the trend means the variation of segmental features and occupies a large portion of the SFHMM, if the trend is shared, the number of parameters of the SFHMM may be decreased. The proposed method shares the trend part of trajectories by quantization. The experiments are performed on the TIMIT corpus to examine the effectiveness of the trend tying. The experimental results show that its performance is the almost same as that of previous studies. To obtain better results with a small amount of parameters, the various conditions for the trajectory components must be considered.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114223034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients mel频率倒谱系数导数的多时间分辨率
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034583
G. Stemmer, C. Hacker, E. Noth, H. Niemann
Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- and second-order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. We present an approach to improve the representation of speech dynamics, which is based on the combination of multiple time resolutions. The resulting feature vector is transformed to reduce its dimension and the correlation between the features. Another possibility, which has also been evaluated, is to use probabilistic PCA (PPCA) for the output distributions of the HMMs. Different configurations of multiple time resolutions are evaluated as well. When compared to the baseline system, a significant reduction of the word error rate can been achieved.
大多数语音识别系统都是基于mel频率倒谱系数及其一阶和二阶导数。导数通常通过将线性回归线拟合到连续帧的固定长度段来近似。估计导数的时间分辨率和平滑度取决于段的长度。提出了一种基于多时间分辨率组合的改进语音动态表示的方法。对得到的特征向量进行变换,以降低其维数和特征之间的相关性。另一种可能性,也已被评估,是使用概率PCA (PPCA)对hmm的输出分布。对多时间分辨率的不同配置也进行了评估。与基线系统相比,可以显著降低单词错误率。
{"title":"Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients","authors":"G. Stemmer, C. Hacker, E. Noth, H. Niemann","doi":"10.1109/ASRU.2001.1034583","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034583","url":null,"abstract":"Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- and second-order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. We present an approach to improve the representation of speech dynamics, which is based on the combination of multiple time resolutions. The resulting feature vector is transformed to reduce its dimension and the correlation between the features. Another possibility, which has also been evaluated, is to use probabilistic PCA (PPCA) for the output distributions of the HMMs. Different configurations of multiple time resolutions are evaluated as well. When compared to the baseline system, a significant reduction of the word error rate can been achieved.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125935493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Construction of model-space constraints 模型空间约束的构造
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034591
Patrick Nguyen, Luca Rigazio, C. Wellekens, J. Junqua
HMM systems exhibit a large amount of redundancy. To this end, a technique called eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA of the training speakers. We show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7% WER improvement (15% relative) when using self-adaptation.
HMM系统表现出大量的冗余。为此,一种被称为特征语音的技术被发现对说话人的适应非常有效。HMM参数之间的相关性通过一个称为特征空间的线性约束来利用。这个约束是通过训练演讲者的PCA得到的。我们展示了PCA如何与最大似然准则相关联。然后,我们将该方法推广到LDA变换和分段线性约束。在华尔街日报(WSJ)听写任务中,当使用自适应时,我们获得了1.7%的WER提高(相对15%)。
{"title":"Construction of model-space constraints","authors":"Patrick Nguyen, Luca Rigazio, C. Wellekens, J. Junqua","doi":"10.1109/ASRU.2001.1034591","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034591","url":null,"abstract":"HMM systems exhibit a large amount of redundancy. To this end, a technique called eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA of the training speakers. We show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the method to LDA transformations and piecewise linear constraints. On the Wall Street Journal (WSJ) dictation task, we obtain 1.7% WER improvement (15% relative) when using self-adaptation.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125155123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Histogram based normalization in the acoustic feature space 声学特征空间中基于直方图的归一化
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034579
S. Molau, Michael Pitz, H. Ney
We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when normalization is applied sequentially at different stages. We show that histogram normalization performs best if applied both in training and recognition, and that smoothing the target histogram obtained on the training data is also helpful. On the VerbMobil II corpus, a German large-vocabulary conversational speech recognition task, we achieve an overall reduction in word error rate of about 10% relative.
我们描述了一种称为直方图归一化的技术,旨在对信号分析前端不同阶段的特征空间分布进行归一化,即对数压缩滤波器组向量、倒谱系数和LDA(局部密度近似)变换后的声学向量。在滤波器组获得最佳结果,并且在大多数情况下,当在不同阶段依次应用规范化时,会有轻微的额外增益。我们表明,直方图归一化在训练和识别中都是最好的,并且平滑训练数据上得到的目标直方图也很有帮助。在vermobil II语料库(一个德语大词汇会话语音识别任务)上,我们实现了单词错误率相对降低10%左右的总体目标。
{"title":"Histogram based normalization in the acoustic feature space","authors":"S. Molau, Michael Pitz, H. Ney","doi":"10.1109/ASRU.2001.1034579","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034579","url":null,"abstract":"We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when normalization is applied sequentially at different stages. We show that histogram normalization performs best if applied both in training and recognition, and that smoothing the target histogram obtained on the training data is also helpful. On the VerbMobil II corpus, a German large-vocabulary conversational speech recognition task, we achieve an overall reduction in word error rate of about 10% relative.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123443284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
ASR in portable wireless devices 便携式无线设备中的ASR
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034597
Olli Viikki
This paper discusses the applicability and role of automatic speech recognition in portable wireless devices. Due to the author's background, the viewpoints are somewhat biased to mobile telephones, but many of the aspects are nevertheless common for other portable devices as well. While still dominated by the speaker-dependent technology, there are today signs that also in wireless devices, there are ASR trends towards speaker-independent systems. As these modern communication devices are usually intended for mass markets, the paper reviews the ASR areas that are relevant for speech recognition on low cost embedded systems. In particular, multilingual ASR, low complexity ASR algorithms and their implementation, and acoustic model adaptation techniques play a key role in enabling cost effective realization of ASR systems. Low complexity and advanced noise robust ASR algorithms are sometimes conflicting concepts. The paper also briefly reviews some of the most important noise robust ASR techniques that are well suited for embedded systems.
本文讨论了语音自动识别在便携式无线设备中的适用性和作用。由于作者的背景,这些观点有些偏向于移动电话,但许多方面对其他便携式设备也是常见的。虽然依赖扬声器的技术仍然占主导地位,但今天有迹象表明,在无线设备中,也有面向独立扬声器系统的ASR趋势。由于这些现代通信设备通常用于大众市场,因此本文综述了与低成本嵌入式系统语音识别相关的ASR领域。特别是,多语言ASR、低复杂度ASR算法及其实现以及声学模型自适应技术在实现ASR系统的成本效益方面发挥了关键作用。低复杂度和高级噪声鲁棒ASR算法有时是相互冲突的概念。本文还简要回顾了一些最重要的噪声鲁棒ASR技术,这些技术非常适合嵌入式系统。
{"title":"ASR in portable wireless devices","authors":"Olli Viikki","doi":"10.1109/ASRU.2001.1034597","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034597","url":null,"abstract":"This paper discusses the applicability and role of automatic speech recognition in portable wireless devices. Due to the author's background, the viewpoints are somewhat biased to mobile telephones, but many of the aspects are nevertheless common for other portable devices as well. While still dominated by the speaker-dependent technology, there are today signs that also in wireless devices, there are ASR trends towards speaker-independent systems. As these modern communication devices are usually intended for mass markets, the paper reviews the ASR areas that are relevant for speech recognition on low cost embedded systems. In particular, multilingual ASR, low complexity ASR algorithms and their implementation, and acoustic model adaptation techniques play a key role in enabling cost effective realization of ASR systems. Low complexity and advanced noise robust ASR algorithms are sometimes conflicting concepts. The paper also briefly reviews some of the most important noise robust ASR techniques that are well suited for embedded systems.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131501424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1