首页 > 最新文献

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

英文 中文
Roles of high-fidelity acoustic modeling in robust speech recognition 高保真声学建模在鲁棒语音识别中的作用
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430075
L. Deng
In this paper I argue that high-fidelity acoustic models have important roles to play in robust speech recognition in face of a multitude of variability ailing many current systems. The discussion of high-fidelity acoustic modeling is posited in the context of general statistical pattern recognition, in which the probabilistic-modeling component that embeds partial, imperfect knowledge is the fundamental building block enabling all other components including recognition error measure, decision rule, and training criterion. Within the session’s theme of acoustic modeling and robust speech recognition, I advance my argument using two concrete examples. First, an acoustic-modeling framework which embeds the knowledge of articulatory-like constraints is shown to be better able to account for the speech variability arising from varying speaking behavior (e.g., speaking rate and style) than without the use of the constraints. This higher-fidelity acoustic model is implemented in a multi-layer dynamic Bayesian network and computer simulation results are presented. Second, the variability in the acoustically distorted speech under adverse environments can be more precisely represented and more effectively handled using the information about phase asynchrony between the un-distorted speech and the mixing noise than without using such information. This high-fidelity, phase-sensitive acoustic distortion model is integrated into the same multi-layer Bayesian network but at separate, causally related layers from those representing the speaking-behavior variability. Related experimental results in the literature are reviewed, providing empirical support to the significant roles that the phase-sensitive model plays in environment-robust speech recognition.
在本文中,我认为高保真声学模型在鲁棒语音识别中扮演着重要的角色,面对许多当前系统的大量可变性。高保真声学建模的讨论是在一般统计模式识别的背景下进行的,其中嵌入部分不完美知识的概率建模组件是实现所有其他组件(包括识别误差测量,决策规则和训练标准)的基本构建块。在声学建模和鲁棒语音识别的主题中,我用两个具体的例子来推进我的论点。首先,声学建模框架嵌入了发音类约束的知识,被证明比不使用约束更能解释由不同的说话行为(例如,说话速度和风格)引起的言语变异。在多层动态贝叶斯网络中实现了该高保真声学模型,并给出了计算机仿真结果。其次,与不使用这些信息相比,使用未失真语音与混合噪声之间的相位异步信息可以更精确地表示和有效地处理不利环境下声学失真语音的可变性。这种高保真、相位敏感的声学失真模型被集成到相同的多层贝叶斯网络中,但与那些代表说话行为可变性的层是分开的、因果相关的层。综述了相关文献的实验结果,为相敏模型在环境鲁棒性语音识别中的重要作用提供了实证支持。
{"title":"Roles of high-fidelity acoustic modeling in robust speech recognition","authors":"L. Deng","doi":"10.1109/ASRU.2007.4430075","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430075","url":null,"abstract":"In this paper I argue that high-fidelity acoustic models have important roles to play in robust speech recognition in face of a multitude of variability ailing many current systems. The discussion of high-fidelity acoustic modeling is posited in the context of general statistical pattern recognition, in which the probabilistic-modeling component that embeds partial, imperfect knowledge is the fundamental building block enabling all other components including recognition error measure, decision rule, and training criterion. Within the session’s theme of acoustic modeling and robust speech recognition, I advance my argument using two concrete examples. First, an acoustic-modeling framework which embeds the knowledge of articulatory-like constraints is shown to be better able to account for the speech variability arising from varying speaking behavior (e.g., speaking rate and style) than without the use of the constraints. This higher-fidelity acoustic model is implemented in a multi-layer dynamic Bayesian network and computer simulation results are presented. Second, the variability in the acoustically distorted speech under adverse environments can be more precisely represented and more effectively handled using the information about phase asynchrony between the un-distorted speech and the mixing noise than without using such information. This high-fidelity, phase-sensitive acoustic distortion model is integrated into the same multi-layer Bayesian network but at separate, causally related layers from those representing the speaking-behavior variability. Related experimental results in the literature are reviewed, providing empirical support to the significant roles that the phase-sensitive model plays in environment-robust speech recognition.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122927110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Interpolation of lost speech segments using LP-HNM model with codebook-mapping post-processing 基于LP-HNM模型和码本映射后处理的丢失语音片段插值
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430076
E. Zavarehei, S. Vaseghi
This paper presents a method for interpolation of lost speech segments. The short-time spectral amplitude (STSA) of speech is modeled using a linear prediction (LP) model of the spectral envelop and a harmonic plus noise model (HNM) of the excitation. The restoration algorithm is based on interpolation of the parameters of LP-HNM models of speech from both side of the gap. A codebook mapping (CBM) technique is used to fit the interpolated parameters to a pre-trained speech model. Experiments show that the CBM module mitigates the artifacts that may result from interpolation of relatively long speech gaps. Evaluations demonstrate that the proposed interpolation method results in a superior quality in comparison to alternative restoration methods.
本文提出了一种语音缺失片段的插值方法。利用频谱包络的线性预测(LP)模型和激励的谐波加噪声(HNM)模型对语音的短时谱幅(STSA)进行建模。该恢复算法基于从间隙两侧插值语音的LP-HNM模型参数。采用码本映射(CBM)技术将插值后的参数拟合到预训练的语音模型中。实验表明,CBM模块减轻了相对较长的语音间隔插值可能导致的伪影。评价结果表明,与其他恢复方法相比,本文提出的插值方法具有更好的恢复质量。
{"title":"Interpolation of lost speech segments using LP-HNM model with codebook-mapping post-processing","authors":"E. Zavarehei, S. Vaseghi","doi":"10.1109/ASRU.2007.4430076","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430076","url":null,"abstract":"This paper presents a method for interpolation of lost speech segments. The short-time spectral amplitude (STSA) of speech is modeled using a linear prediction (LP) model of the spectral envelop and a harmonic plus noise model (HNM) of the excitation. The restoration algorithm is based on interpolation of the parameters of LP-HNM models of speech from both side of the gap. A codebook mapping (CBM) technique is used to fit the interpolated parameters to a pre-trained speech model. Experiments show that the CBM module mitigates the artifacts that may result from interpolation of relatively long speech gaps. Evaluations demonstrate that the proposed interpolation method results in a superior quality in comparison to alternative restoration methods.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Robust topic inference for latent semantic language model adaptation 基于潜在语义语言模型自适应的鲁棒主题推理
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430105
A. Heidel, Lin-Shan Lee
We perform topic-based, unsupervised language model adaptation under an N-best rescoring framework by using previous-pass system hypotheses to infer a topic mixture which is used to select topic-dependent LMs for interpolation with a topic-independent LM. Our primary focus is on techniques for improving the robustness of topic inference for a given utterance with respect to recognition errors, including the use of ASR confidence and contextual information from surrounding utterances. We describe a novel application of metadata-based pseudo-story segmentation to language model adaptation, and present good improvements to character error rate on multi-genre GALE Project data in Mandarin Chinese.
我们在N-best评分框架下执行基于主题的无监督语言模型自适应,通过使用先前的系统假设来推断主题混合物,该主题混合物用于选择主题相关的LM进行与主题独立LM的插值。我们的主要重点是提高给定话语在识别错误方面的主题推理鲁棒性的技术,包括使用ASR置信度和来自周围话语的上下文信息。本文描述了基于元数据的伪故事分割在语言模型自适应中的新应用,并对多体裁GALE项目中文普通话数据的字符错误率进行了较好的改进。
{"title":"Robust topic inference for latent semantic language model adaptation","authors":"A. Heidel, Lin-Shan Lee","doi":"10.1109/ASRU.2007.4430105","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430105","url":null,"abstract":"We perform topic-based, unsupervised language model adaptation under an N-best rescoring framework by using previous-pass system hypotheses to infer a topic mixture which is used to select topic-dependent LMs for interpolation with a topic-independent LM. Our primary focus is on techniques for improving the robustness of topic inference for a given utterance with respect to recognition errors, including the use of ASR confidence and contextual information from surrounding utterances. We describe a novel application of metadata-based pseudo-story segmentation to language model adaptation, and present good improvements to character error rate on multi-genre GALE Project data in Mandarin Chinese.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128435068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Non-native speech databases 非母语语音数据库
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430148
M. Raab, R. Gruhn, E. Nöth
This paper presents a review of already collected non-native speech databases. Although the number of non-native speech databases is significantly less than the one of common speech databases, there were already a lot of data collection efforts taken at different institutes and companies. Because of the comparably small size of the databases, many of them are not available through the common distributors of speech corpora like ELDA or LDC. This leads to the fact that it is hard to keep an overview of what kind of databases have already been collected, and for what purposes there are still no collections. With this paper we hope to provide a useful resource regarding this issue.
本文对已经收集到的非母语语音数据库进行了综述。虽然非母语语音数据库的数量明显少于普通语音数据库,但不同的研究机构和公司已经进行了大量的数据收集工作。由于数据库的规模相对较小,许多数据库无法通过常用的语音语料库分发器(如ELDA或LDC)获得。这导致了这样一个事实,即很难保持对已经收集的数据库类型的概述,以及仍然没有收集的目的。我们希望通过本文为这一问题提供有用的资源。
{"title":"Non-native speech databases","authors":"M. Raab, R. Gruhn, E. Nöth","doi":"10.1109/ASRU.2007.4430148","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430148","url":null,"abstract":"This paper presents a review of already collected non-native speech databases. Although the number of non-native speech databases is significantly less than the one of common speech databases, there were already a lot of data collection efforts taken at different institutes and companies. Because of the comparably small size of the databases, many of them are not available through the common distributors of speech corpora like ELDA or LDC. This leads to the fact that it is hard to keep an overview of what kind of databases have already been collected, and for what purposes there are still no collections. With this paper we hope to provide a useful resource regarding this issue.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126908524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Factor analysis of acoustic features for streamed hidden Markov modeling 流隐马尔可夫建模声学特征的因子分析
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430079
Chuan-Wei Ting, Jen-Tzung Chien
This paper presents a new streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) is performed to discover the common factors of acoustic features. The streaming regularities are governed by the correlation between features, which is inherent in common factors. Those features corresponding to the same factor are generated by identical HMM state. Accordingly, we use multiple Markov chains to represent the variation trends in cepstral features. We develop a FA streamed HMM (FASHMM) and go beyond the conventional HMM assuming that all features at a speech frame conduct the same state emission. This streamed HMM is more delicate than the factorial HMM where the streaming was empirically determined. We also exploit a new decoding algorithm for FASHMM speech recognition. In this manner, we fulfill the flexible Markov chains for an input sequence of multivariate Gaussian mixture observations. In the experiments, the proposed method can reduce word error rate by 36% at most.
提出了一种新的流隐马尔可夫模型(HMM)框架。因子分析(FA)是发现声学特征的共同因素。流的规律是由特征之间的相关性决定的,这种相关性是共同因素所固有的。这些特征对应于相同的因子是由相同的HMM状态生成的。因此,我们使用多个马尔可夫链来表示倒谱特征的变化趋势。我们开发了一种FA流HMM (FASHMM),并超越了传统的HMM,假设语音帧的所有特征都进行相同的状态发射。这种流化HMM比经验决定流化的阶乘HMM更精细。我们还开发了一种新的FASHMM语音识别解码算法。用这种方法,我们实现了多元高斯混合观测值输入序列的柔性马尔可夫链。在实验中,该方法最多可将单词错误率降低36%。
{"title":"Factor analysis of acoustic features for streamed hidden Markov modeling","authors":"Chuan-Wei Ting, Jen-Tzung Chien","doi":"10.1109/ASRU.2007.4430079","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430079","url":null,"abstract":"This paper presents a new streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) is performed to discover the common factors of acoustic features. The streaming regularities are governed by the correlation between features, which is inherent in common factors. Those features corresponding to the same factor are generated by identical HMM state. Accordingly, we use multiple Markov chains to represent the variation trends in cepstral features. We develop a FA streamed HMM (FASHMM) and go beyond the conventional HMM assuming that all features at a speech frame conduct the same state emission. This streamed HMM is more delicate than the factorial HMM where the streaming was empirically determined. We also exploit a new decoding algorithm for FASHMM speech recognition. In this manner, we fulfill the flexible Markov chains for an input sequence of multivariate Gaussian mixture observations. In the experiments, the proposed method can reduce word error rate by 36% at most.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124701590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Investigating the use of speech features and their corresponding distribution characteristics for robust speech recognition 研究语音特征及其相应分布特征在鲁棒语音识别中的应用
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430089
Shih-Hsiang Lin, Yao-ming Yeh, Berlin Chen
The performance of current automatic speech recognition (ASR) systems often deteriorates radically when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness over the last few decades. Related work reported in the literature can be generally divided into two aspects according to whether the orientation of the methods is either from the feature domain or from the corresponding probability distributions. In this paper, we present a polynomial regression approach which has the merit of directly characterizing the relationship between the speech features and their corresponding probability distributions to compensate the noise effects. Two variants of the proposed approach are also extensively investigated as well. All experiments are conducted on the Aurora-2 database and task. Experimental results show that for clean-condition training, our approaches achieve considerable word error rate reductions over the baseline system, and also significantly outperform other conventional methods.
当输入语音受到各种噪声源的干扰时,当前自动语音识别系统的性能往往会急剧下降。在过去的几十年里,已经提出了相当多的技术来提高ASR的鲁棒性。根据方法的方向是从特征域还是从相应的概率分布,文献中报道的相关工作大致可以分为两个方面。本文提出了一种多项式回归方法,该方法可以直接表征语音特征与其对应的概率分布之间的关系,以补偿噪声的影响。所提出的方法的两个变体也被广泛研究。所有实验均在Aurora-2数据库和任务上进行。实验结果表明,对于清洁条件训练,我们的方法在基线系统上实现了相当大的单词错误率降低,并且显著优于其他传统方法。
{"title":"Investigating the use of speech features and their corresponding distribution characteristics for robust speech recognition","authors":"Shih-Hsiang Lin, Yao-ming Yeh, Berlin Chen","doi":"10.1109/ASRU.2007.4430089","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430089","url":null,"abstract":"The performance of current automatic speech recognition (ASR) systems often deteriorates radically when the input speech is corrupted by various kinds of noise sources. Quite a few of techniques have been proposed to improve ASR robustness over the last few decades. Related work reported in the literature can be generally divided into two aspects according to whether the orientation of the methods is either from the feature domain or from the corresponding probability distributions. In this paper, we present a polynomial regression approach which has the merit of directly characterizing the relationship between the speech features and their corresponding probability distributions to compensate the noise effects. Two variants of the proposed approach are also extensively investigated as well. All experiments are conducted on the Aurora-2 database and task. Experimental results show that for clean-condition training, our approaches achieve considerable word error rate reductions over the baseline system, and also significantly outperform other conventional methods.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126790665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian adaptation in HMM training and decoding using a mixture of feature transforms 混合特征变换在HMM训练和解码中的贝叶斯自适应
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430133
S. Tsakalidis, S. Matsoukas
Adaptive training under a Bayesian framework addresses some limitations of the standard maximum likelihood approaches. Also, the adaptively trained system can be directly used in unsupervised inference. The Bayesian framework uses a distribution of the transform rather than a point estimate. A continuous transform distribution makes the integral associated with the Bayesian framework intractable and therefore various approximations have been proposed. In this paper we model the transform distribution via a mixture of transforms. Under this model, the likelihood of an utterance is computed as a weighted sum of the likelihoods obtained by transforming its features based on each of the transforms in the mixture, with weights set to the transform priors. Experimental results on Arabic broadcast news exhibit increased likelihood on acoustic training data and improved speech recognition performance on unseen test data, compared to speaker independent and standard adaptive models.
贝叶斯框架下的自适应训练解决了标准最大似然方法的一些局限性。此外,该自适应训练系统可直接用于无监督推理。贝叶斯框架使用变换的分布而不是点估计。连续变换分布使得与贝叶斯框架相关的积分难以处理,因此提出了各种近似方法。本文通过混合变换对变换分布进行建模。在该模型下,话语的似然被计算为基于混合中的每个变换变换其特征所获得的似然的加权和,权重设置为变换先验。与独立于说话者和标准自适应模型相比,阿拉伯语广播新闻的实验结果表明,声学训练数据的可能性增加,未见测试数据的语音识别性能得到改善。
{"title":"Bayesian adaptation in HMM training and decoding using a mixture of feature transforms","authors":"S. Tsakalidis, S. Matsoukas","doi":"10.1109/ASRU.2007.4430133","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430133","url":null,"abstract":"Adaptive training under a Bayesian framework addresses some limitations of the standard maximum likelihood approaches. Also, the adaptively trained system can be directly used in unsupervised inference. The Bayesian framework uses a distribution of the transform rather than a point estimate. A continuous transform distribution makes the integral associated with the Bayesian framework intractable and therefore various approximations have been proposed. In this paper we model the transform distribution via a mixture of transforms. Under this model, the likelihood of an utterance is computed as a weighted sum of the likelihoods obtained by transforming its features based on each of the transforms in the mixture, with weights set to the transform priors. Experimental results on Arabic broadcast news exhibit increased likelihood on acoustic training data and improved speech recognition performance on unseen test data, compared to speaker independent and standard adaptive models.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124350035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Refine bigram PLSA model by assigning latent topics unevenly 通过不均匀分配潜在主题来改进双图PLSA模型
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430099
Jiazhong Nie, Runxin Li, D. Luo, Xihong Wu
As an important component in many speech and language processing applications, statistical language model has been widely investigated. The bigram topic model, which combines advantages of both the traditional n-gram model and the topic model, turns out to be a promising language modeling approach. However, the original bigram topic model assigns the same topic number for each context word but ignores the fact that there are different complexities to the latent semantics of context words, we present a new bigram topic model, the bigram PLSA model, and propose a modified training strategy that unevenly assigns latent topics to context words according to an estimation of their latent semantic complexities. As a consequence, a refined bigram PLSA model is reached. Experiments on HUB4 Mandarin test transcriptions reveal the superiority over existing models and further performance improvements on perplexity are achieved through the use of the refined bigram PLSA model.
统计语言模型作为许多语音和语言处理应用的重要组成部分,得到了广泛的研究。双元图主题模型结合了传统n元图模型和主题模型的优点,是一种很有前途的语言建模方法。然而,原有的双图主题模型为每个上下文词分配了相同的主题数,但忽略了上下文词潜在语义的复杂性不同的事实,我们提出了一种新的双图主题模型——双图PLSA模型,并提出了一种改进的训练策略,根据对上下文词潜在语义复杂性的估计,不均匀地为上下文词分配潜在主题。因此,得到了一个精细化的双元PLSA模型。对HUB4普通话测试转录的实验表明,该模型优于现有模型,并通过使用改进的双字母PLSA模型进一步提高了困惑度的性能。
{"title":"Refine bigram PLSA model by assigning latent topics unevenly","authors":"Jiazhong Nie, Runxin Li, D. Luo, Xihong Wu","doi":"10.1109/ASRU.2007.4430099","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430099","url":null,"abstract":"As an important component in many speech and language processing applications, statistical language model has been widely investigated. The bigram topic model, which combines advantages of both the traditional n-gram model and the topic model, turns out to be a promising language modeling approach. However, the original bigram topic model assigns the same topic number for each context word but ignores the fact that there are different complexities to the latent semantics of context words, we present a new bigram topic model, the bigram PLSA model, and propose a modified training strategy that unevenly assigns latent topics to context words according to an estimation of their latent semantic complexities. As a consequence, a refined bigram PLSA model is reached. Experiments on HUB4 Mandarin test transcriptions reveal the superiority over existing models and further performance improvements on perplexity are achieved through the use of the refined bigram PLSA model.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122745932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Reranking machine translation hypotheses with structured and web-based language models 用结构化和基于网络的语言模型对机器翻译假设进行重新排序
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430102
Wen Wang, A. Stolcke, Jing Zheng
In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system. These language models, developed from constraint dependency grammar parses, tightly integrate knowledge of words, morphological and lexical features, and syntactic dependency constraints. Two structured language models are applied for N-best rescoring, one is an almost-parsing language model, and the other utilizes more syntactic features by explicitly modeling syntactic dependencies between words. We also investigate effective and efficient language modeling methods to use N-grams extracted from up to 1 teraword of web documents. We apply all these language models for N-best re-ranking on the NIST and DARPA GALE program1 2006 and 2007 machine translation evaluation tasks and find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.
在本文中,我们研究了在统计机器翻译系统中使用语言动机和计算效率高的结构化语言模型对n个最佳假设进行重新排序。这些语言模型由约束依赖语法解析发展而来,紧密地集成了单词知识、形态和词汇特征以及句法依赖约束。两种结构化语言模型应用于N-best评分,一种是几乎解析语言模型,另一种通过显式建模词之间的句法依赖性来利用更多的句法特征。我们还研究了有效和高效的语言建模方法,使用从多达1兆字的web文档中提取的N-grams。我们将所有这些语言模型应用于NIST和DARPA GALE项目1 2006年和2007年的机器翻译评估任务上进行N-best重新排名,发现这些语言模型的组合在盲测试集上将BLEU分数绝对提高了1.6%。
{"title":"Reranking machine translation hypotheses with structured and web-based language models","authors":"Wen Wang, A. Stolcke, Jing Zheng","doi":"10.1109/ASRU.2007.4430102","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430102","url":null,"abstract":"In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system. These language models, developed from constraint dependency grammar parses, tightly integrate knowledge of words, morphological and lexical features, and syntactic dependency constraints. Two structured language models are applied for N-best rescoring, one is an almost-parsing language model, and the other utilizes more syntactic features by explicitly modeling syntactic dependencies between words. We also investigate effective and efficient language modeling methods to use N-grams extracted from up to 1 teraword of web documents. We apply all these language models for N-best re-ranking on the NIST and DARPA GALE program1 2006 and 2007 machine translation evaluation tasks and find that the combination of these language models increases the BLEU score up to 1.6% absolutely on blind test sets.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129515996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance VAD评估框架CENSREC-1-C的开发及VAD与语音识别性能关系的研究
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430182
N. Kitaoka, Kazumasa Yamamoto, Tomohiro Kusamizu, S. Nakagawa, Takeshi Yamada, S. Tsuge, C. Miyajima, T. Nishiura, M. Nakayama, Y. Denda, M. Fujimoto, T. Takiguchi, S. Tamura, S. Kuroiwa, K. Takeda, Satoshi Nakamura
Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called corpus and environment for noisy speech recognition 1 concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.
语音活动检测(VAD)在噪声环境下的语音识别、语音增强和语音编码等语音处理中起着重要的作用。我们开发了一个在这种环境下的VAD评估框架,称为语料库和噪声语音识别环境1连接(CENSREC-1-C)。该框架由噪声连续数字话语和VAD结果评估工具组成。采用帧级检测性能和话语级检测性能两种评价指标,给出基于功率的VAD方法的评价结果作为基线。在语音识别器中使用VAD时,对检测到的语音片段进行扩展以避免语音帧的丢失,然后将暂停片段由暂停模型吸收。通过对语音片段扩展的实验模拟,我们研究了VAD显式分割和暂停模型隐式分割的平衡,并证明了小的扩展可以改善语音识别。
{"title":"Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance","authors":"N. Kitaoka, Kazumasa Yamamoto, Tomohiro Kusamizu, S. Nakagawa, Takeshi Yamada, S. Tsuge, C. Miyajima, T. Nishiura, M. Nakayama, Y. Denda, M. Fujimoto, T. Takiguchi, S. Tamura, S. Kuroiwa, K. Takeda, Satoshi Nakamura","doi":"10.1109/ASRU.2007.4430182","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430182","url":null,"abstract":"Voice activity detection (VAD) plays an important role in speech processing including speech recognition, speech enhancement, and speech coding in noisy environments. We developed an evaluation framework for VAD in such environments, called corpus and environment for noisy speech recognition 1 concatenated (CENSREC-1-C). This framework consists of noisy continuous digit utterances and evaluation tools for VAD results. By adoptiong two evaluation measures, one for frame-level detection performance and the other for utterance-level detection performance, we provide the evaluation results of a power-based VAD method as a baseline. When using VAD in speech recognizer, the detected speech segments are extended to avoid the loss of speech frames and the pause segments are then absorbed by a pause model. We investigate the balance of an explicit segmentation by VAD and an implicit segmentation by a pause model using an experimental simulation of segment extension and show that a small extension improves speech recognition.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134097038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
期刊
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1