首页 > 最新文献

IEEE Transactions on Audio Speech and Language Processing最新文献

英文 中文
Automatic Accent Assessment Using Phonetic Mismatch and Human Perception 基于语音不匹配和人类感知的自动口音评估
Pub Date : 2013-09-01 DOI: 10.1109/TASL.2013.2258011
F. William, A. Sangwan, J. Hansen
In this study, a new algorithm for automatic accent evaluation of native and non-native speakers is presented. The proposed system consists of two main steps: alignment and scoring. In the alignment step, the speech utterance is processed using a Weighted Finite State Transducer (WFST) based technique to automatically estimate the pronunciation mismatches (substitutions, deletions, and insertions). Subsequently, in the scoring step, two scoring systems which utilize the pronunciation mismatches from the alignment phase are proposed: (i) a WFST-scoring system to measure the degree of accentedness on a scale from -1 (non-native like) to +1 (native like), and a (ii) Maximum Entropy (ME) based technique to assign perceptually motivated scores to pronunciation mismatches. The accent scores provided from the WFST-scoring system as well as the ME scoring system are termed as the WFST and P-WFST (perceptual WFST) accent scores, respectively. The proposed systems are evaluated on American English (AE) spoken by native and non-native (native speakers of Mandarin-Chinese) speakers from the CU-Accent corpus. A listener evaluation of 50 Native American English (N-AE) was employed to assist in validating the performance of the proposed accent assessment systems. The proposed P-WFST algorithm shows higher and more consistent correlation with human evaluated accent scores, when compared to the Goodness Of Pronunciation (GOP) measure. The proposed solution for accent classification and assessment based on WFST and P-WFST scores show that an effective advancement is possible which correlates well with human perception.
本研究提出了一种新的母语和非母语使用者口音自动评估算法。提出的系统包括两个主要步骤:对齐和评分。在对齐步骤中,使用基于加权有限状态换能器(WFST)的技术对语音进行处理,自动估计发音不匹配(替换、删除和插入)。随后,在评分步骤中,提出了两种利用对齐阶段的发音不匹配的评分系统:(i) wfst评分系统,用于在-1(非母语相似)到+1(母语相似)的范围内测量口音程度,以及(ii)基于最大熵(ME)的技术,用于为发音不匹配分配感知动机分数。由WFST评分系统和ME评分系统提供的口音分数分别被称为WFST和P-WFST(感知WFST)口音分数。提出的系统对来自CU-Accent语料库的母语和非母语(汉语普通话为母语的人)的美国英语(AE)进行了评估。对50名美国原住民英语(N-AE)的听者进行了评估,以协助验证所提出的口音评估系统的性能。与发音优度(GOP)测量相比,所提出的P-WFST算法与人类评估的口音分数具有更高和更一致的相关性。本文提出的基于WFST和P-WFST分数的口音分类和评估方法表明,这种方法可以有效地提高口音分类和评估的效率,并且与人类的感知密切相关。
{"title":"Automatic Accent Assessment Using Phonetic Mismatch and Human Perception","authors":"F. William, A. Sangwan, J. Hansen","doi":"10.1109/TASL.2013.2258011","DOIUrl":"https://doi.org/10.1109/TASL.2013.2258011","url":null,"abstract":"In this study, a new algorithm for automatic accent evaluation of native and non-native speakers is presented. The proposed system consists of two main steps: alignment and scoring. In the alignment step, the speech utterance is processed using a Weighted Finite State Transducer (WFST) based technique to automatically estimate the pronunciation mismatches (substitutions, deletions, and insertions). Subsequently, in the scoring step, two scoring systems which utilize the pronunciation mismatches from the alignment phase are proposed: (i) a WFST-scoring system to measure the degree of accentedness on a scale from -1 (non-native like) to +1 (native like), and a (ii) Maximum Entropy (ME) based technique to assign perceptually motivated scores to pronunciation mismatches. The accent scores provided from the WFST-scoring system as well as the ME scoring system are termed as the WFST and P-WFST (perceptual WFST) accent scores, respectively. The proposed systems are evaluated on American English (AE) spoken by native and non-native (native speakers of Mandarin-Chinese) speakers from the CU-Accent corpus. A listener evaluation of 50 Native American English (N-AE) was employed to assist in validating the performance of the proposed accent assessment systems. The proposed P-WFST algorithm shows higher and more consistent correlation with human evaluated accent scores, when compared to the Goodness Of Pronunciation (GOP) measure. The proposed solution for accent classification and assessment based on WFST and P-WFST scores show that an effective advancement is possible which correlates well with human perception.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1818-1829"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2258011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Nonlinear Acoustic Echo Cancellation Based on a Sliding-Window Leaky Kernel Affine Projection Algorithm 基于滑动窗漏核仿射投影算法的非线性声回波消除
Pub Date : 2013-09-01 DOI: 10.1109/TASL.2013.2260742
Jose Manuel Gil-Cacho, M. Signoretto, T. Waterschoot, M. Moonen, S. H. Jensen
Acoustic echo cancellation (AEC) is used in speech communication systems where the existence of echoes degrades the speech intelligibility. Standard approaches to AEC rely on the assumption that the echo path to be identified can be modeled by a linear filter. However, some elements introduce nonlinear distortion and must be modeled as nonlinear systems. Several nonlinear models have been used with more or less success. The kernel affine projection algorithm (KAPA) has been successfully applied to many areas in signal processing but not yet to nonlinear AEC (NLAEC). The contribution of this paper is three-fold: (1) to apply KAPA to the NLAEC problem, (2) to develop a sliding-window leaky KAPA (SWL-KAPA) that is well suited for NLAEC applications, and (3) to propose a kernel function, consisting of a weighted sum of a linear and a Gaussian kernel. In our experiment set-up, the proposed SWL-KAPA for NLAEC consistently outperforms the linear APA, resulting in up to 12 dB of improvement in ERLE at a computational cost that is only 4.6 times higher. Moreover, it is shown that the SWL-KAPA outperforms, by 4-6 dB, a Volterra-based NLAEC, which itself has a much higher 413 times computational cost than the linear APA.
回声消除技术主要应用于语音通信系统中,回声的存在会降低语音的可理解性。AEC的标准方法依赖于要识别的回波路径可以通过线性滤波器建模的假设。然而,一些元件引入了非线性畸变,必须作为非线性系统建模。一些非线性模型的应用或多或少取得了成功。核仿射投影算法(KAPA)已成功地应用于信号处理的许多领域,但尚未应用于非线性AEC (NLAEC)。本文的贡献有三个方面:(1)将KAPA应用于NLAEC问题,(2)开发了一个非常适合NLAEC应用的滑动窗口泄漏KAPA (SWL-KAPA),以及(3)提出了一个由线性核和高斯核加权和组成的核函数。在我们的实验设置中,提出的用于NLAEC的SWL-KAPA始终优于线性APA,导致ERLE提高高达12 dB,而计算成本仅高出4.6倍。此外,研究表明,SWL-KAPA比基于volterra的NLAEC性能好4-6 dB,后者本身的计算成本比线性APA高413倍。
{"title":"Nonlinear Acoustic Echo Cancellation Based on a Sliding-Window Leaky Kernel Affine Projection Algorithm","authors":"Jose Manuel Gil-Cacho, M. Signoretto, T. Waterschoot, M. Moonen, S. H. Jensen","doi":"10.1109/TASL.2013.2260742","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260742","url":null,"abstract":"Acoustic echo cancellation (AEC) is used in speech communication systems where the existence of echoes degrades the speech intelligibility. Standard approaches to AEC rely on the assumption that the echo path to be identified can be modeled by a linear filter. However, some elements introduce nonlinear distortion and must be modeled as nonlinear systems. Several nonlinear models have been used with more or less success. The kernel affine projection algorithm (KAPA) has been successfully applied to many areas in signal processing but not yet to nonlinear AEC (NLAEC). The contribution of this paper is three-fold: (1) to apply KAPA to the NLAEC problem, (2) to develop a sliding-window leaky KAPA (SWL-KAPA) that is well suited for NLAEC applications, and (3) to propose a kernel function, consisting of a weighted sum of a linear and a Gaussian kernel. In our experiment set-up, the proposed SWL-KAPA for NLAEC consistently outperforms the linear APA, resulting in up to 12 dB of improvement in ERLE at a computational cost that is only 4.6 times higher. Moreover, it is shown that the SWL-KAPA outperforms, by 4-6 dB, a Volterra-based NLAEC, which itself has a much higher 413 times computational cost than the linear APA.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1867-1878"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260742","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Sound Source Distance Estimation in Rooms based on Statistical Properties of Binaural Signals 基于双耳信号统计特性的室内声源距离估计
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260155
Eleftheria Georganti, T. May, S. Par, J. Mourjopoulos
A novel method for the estimation of the distance of a sound source from binaural speech signals is proposed. The method relies on several statistical features extracted from such signals and their binaural cues. Firstly, the standard deviation of the difference of the magnitude spectra of the left and right binaural signals is used as a feature for this method. In addition, an extended set of additional statistical features that can improve distance detection is extracted from an auditory front-end which models the peripheral processing of the human auditory system. The method incorporates the above features into two classification frameworks based on Gaussian mixture models and Support Vector Machines and the relative merits of those frameworks are evaluated. The proposed method achieves distance detection when tested in various acoustical environments and performs well in unknown environments. Its performance is also compared to an existing binaural distance detection method.
提出了一种从双耳语音信号中估计声源距离的新方法。该方法依赖于从这些信号及其双耳线索中提取的几个统计特征。该方法首先利用左右双耳信号的星等谱差的标准差作为特征;此外,从模拟人类听觉系统外围处理的听觉前端提取了一组扩展的附加统计特征,可以改进距离检测。该方法将上述特征融合到基于高斯混合模型和支持向量机的两种分类框架中,并对两种框架的优劣进行了比较。该方法在各种声环境下均能实现距离检测,在未知环境下也能取得良好的效果。并与现有的双耳距离检测方法进行了比较。
{"title":"Sound Source Distance Estimation in Rooms based on Statistical Properties of Binaural Signals","authors":"Eleftheria Georganti, T. May, S. Par, J. Mourjopoulos","doi":"10.1109/TASL.2013.2260155","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260155","url":null,"abstract":"A novel method for the estimation of the distance of a sound source from binaural speech signals is proposed. The method relies on several statistical features extracted from such signals and their binaural cues. Firstly, the standard deviation of the difference of the magnitude spectra of the left and right binaural signals is used as a feature for this method. In addition, an extended set of additional statistical features that can improve distance detection is extracted from an auditory front-end which models the peripheral processing of the human auditory system. The method incorporates the above features into two classification frameworks based on Gaussian mixture models and Support Vector Machines and the relative merits of those frameworks are evaluated. The proposed method achieves distance detection when tested in various acoustical environments and performs well in unknown environments. Its performance is also compared to an existing binaural distance detection method.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1727-1741"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260155","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition 基于贝叶斯特征增强的混响和噪声鲁棒语音识别
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2258013
Volker Leutnant, A. Krueger, Reinhold Häb-Umbach
In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data.
在这篇贡献中,我们扩展了先前提出的贝叶斯方法,用于增强混响对数功率谱系数,用于鲁棒自动语音识别,以额外补偿背景噪声。采用最近提出的一种观测模型,该模型的时变观测误差统计量是干净语音特征向量的后验概率密度函数推断的副产物。通过使用观测模型的递归公式,进一步减少了计算量和内存需求。首先在一个带有人工产生的混响噪声数据的连接数字识别任务中对所提算法的性能进行了实验研究。结果表明,与时不变模型相比,使用时变观测误差模型可以在低信噪比下显著降低误差率。进一步的实验是在一个混响和嘈杂的环境中记录一个5000字的任务。获得了显着的单词错误率降低,证明了该方法在实际数据上的有效性。
{"title":"Bayesian Feature Enhancement for Reverberation and Noise Robust Speech Recognition","authors":"Volker Leutnant, A. Krueger, Reinhold Häb-Umbach","doi":"10.1109/TASL.2013.2258013","DOIUrl":"https://doi.org/10.1109/TASL.2013.2258013","url":null,"abstract":"In this contribution we extend a previously proposed Bayesian approach for the enhancement of reverberant logarithmic mel power spectral coefficients for robust automatic speech recognition to the additional compensation of background noise. A recently proposed observation model is employed whose time-variant observation error statistics are obtained as a side product of the inference of the a posteriori probability density function of the clean speech feature vectors. Further a reduction of the computational effort and the memory requirements are achieved by using a recursive formulation of the observation model. The performance of the proposed algorithms is first experimentally studied on a connected digits recognition task with artificially created noisy reverberant data. It is shown that the use of the time-variant observation error model leads to a significant error rate reduction at low signal-to-noise ratios compared to a time-invariant model. Further experiments were conducted on a 5000 word task recorded in a reverberant and noisy environment. A significant word error rate reduction was obtained demonstrating the effectiveness of the approach on real-world data.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1640-1652"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2258013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A General Compression Approach to Multi-Channel Three-Dimensional Audio 多声道三维音频的通用压缩方法
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2260156
B. Cheng, C. Ritz, I. Burnett, Xiguang Zheng
This paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spatial sound sources within the 3D space as rendered by a multi-channel audio signal (in this case 16 channels). This analysis results in the derivation of a stereo downmix signal representing the original 16 channels. Alternatively, a mono-downmix signal with side information representing the location of sound sources within the 3D spatial scene can also be derived. The resulting downmix signals are then compressed with a traditional audio coder, resulting in a representation of the 3D soundfield at bit rates comparable with existing stereo audio coders while maintaining the perceptual quality produced from separate encoding of each channel.
本文提出了一种由多个扬声器通道产生的三维(3D)音频的低比特率压缩技术。该方法基于多通道音频信号(在本例中为16通道)呈现的3D空间内空间声源定位的时频分析。这一分析结果在派生的立体声下混信号表示原来的16个通道。或者,还可以导出具有表示3D空间场景中声源位置的侧信息的单频下混信号。然后用传统的音频编码器压缩产生的下行信号,产生与现有立体声音频编码器相当的比特率的3D声场表示,同时保持每个通道独立编码产生的感知质量。
{"title":"A General Compression Approach to Multi-Channel Three-Dimensional Audio","authors":"B. Cheng, C. Ritz, I. Burnett, Xiguang Zheng","doi":"10.1109/TASL.2013.2260156","DOIUrl":"https://doi.org/10.1109/TASL.2013.2260156","url":null,"abstract":"This paper presents a technique for low bit rate compression of three-dimensional (3D) audio produced by multiple loudspeaker channels. The approach is based on the time-frequency analysis of the localization of spatial sound sources within the 3D space as rendered by a multi-channel audio signal (in this case 16 channels). This analysis results in the derivation of a stereo downmix signal representing the original 16 channels. Alternatively, a mono-downmix signal with side information representing the location of sound sources within the 3D spatial scene can also be derived. The resulting downmix signals are then compressed with a traditional audio coder, resulting in a representation of the 3D soundfield at bit rates comparable with existing stereo audio coders while maintaining the perceptual quality produced from separate encoding of each channel.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1676-1688"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2260156","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Sparse Classifier Fusion for Speaker Verification 基于稀疏分类器融合的说话人验证
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2256895
Ville Hautamäki, T. Kinnunen, Filip Sedlak, Kong-Aik Lee, B. Ma, Haizhou Li
State-of-the-art speaker verification systems take advantage of a number of complementary base classifiers by fusing them to arrive at reliable verification decisions. In speaker verification, fusion is typically implemented as a weighted linear combination of the base classifier scores, where the combination weights are estimated using a logistic regression model. An alternative way for fusion is to use classifier ensemble selection, which can be seen as sparse regularization applied to logistic regression. Even though score fusion has been extensively studied in speaker verification, classifier ensemble selection is much less studied. In this study, we extensively study a sparse classifier fusion on a collection of twelve I4U spectral subsystems on the NIST 2008 and 2010 speaker recognition evaluation (SRE) corpora.
最先进的说话人验证系统利用了许多互补的基础分类器,通过融合它们来得出可靠的验证决策。在说话人验证中,融合通常是作为基本分类器分数的加权线性组合来实现的,其中组合权重是使用逻辑回归模型估计的。融合的另一种方法是使用分类器集成选择,这可以看作是稀疏正则化应用于逻辑回归。尽管分数融合在说话人验证中得到了广泛的研究,但分类器集成选择的研究却很少。在本研究中,我们广泛研究了NIST 2008年和2010年说话人识别评估(SRE)语料库上12个I4U光谱子系统的稀疏分类器融合。
{"title":"Sparse Classifier Fusion for Speaker Verification","authors":"Ville Hautamäki, T. Kinnunen, Filip Sedlak, Kong-Aik Lee, B. Ma, Haizhou Li","doi":"10.1109/TASL.2013.2256895","DOIUrl":"https://doi.org/10.1109/TASL.2013.2256895","url":null,"abstract":"State-of-the-art speaker verification systems take advantage of a number of complementary base classifiers by fusing them to arrive at reliable verification decisions. In speaker verification, fusion is typically implemented as a weighted linear combination of the base classifier scores, where the combination weights are estimated using a logistic regression model. An alternative way for fusion is to use classifier ensemble selection, which can be seen as sparse regularization applied to logistic regression. Even though score fusion has been extensively studied in speaker verification, classifier ensemble selection is much less studied. In this study, we extensively study a sparse classifier fusion on a collection of twelve I4U spectral subsystems on the NIST 2008 and 2010 speaker recognition evaluation (SRE) corpora.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1622-1631"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2256895","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Syntax-Based Translation With Bilingually Lexicalized Synchronous Tree Substitution Grammars 基于句法的双语词汇化同步树替换语法翻译
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255283
Jiajun Zhang, Feifei Zhai, Chengqing Zong
Syntax-based models can significantly improve the translation performance due to their grammatical modeling on one or both language side(s). However, the translation rules such as the non-lexical rule “ VP→(x0x1,VP:x1PP:x0)” in string-to-tree models do not consider any lexicalized information on the source or target side. The rule is so generalized that any subtree rooted at VP can substitute for the nonterminal VP:x1. Because rules containing nonterminals are frequently used when generating the target-side tree structures, there is a risk that rules of this type will potentially be severely misused in decoding due to a lack of lexicalization guidance. In this article, inspired by lexicalized PCFG, which is widely used in monolingual parsing, we propose to upgrade the STSG (synchronous tree substitution grammars)-based syntax translation model with bilingually lexicalized STSG. Using the string-to-tree translation model as a case study, we present generative and discriminative models to integrate lexicalized STSG into the translation model. Both small- and large-scale experiments on Chinese-to-English translation demonstrate that the proposed lexicalized STSG can provide superior rule selection in decoding and substantially improve the translation quality.
基于语法的模型由于在语言的一方或双方进行语法建模,可以显著提高翻译性能。但是,字符串到树模型中的翻译规则(如非词法规则“VP→(x0x1,VP:x1PP:x0)”)不考虑源端或目标端的任何词法化信息。这个规则是如此的一般化,以至于任何根在VP的子树都可以代替非终结的VP:x1。由于在生成目标端树结构时经常使用包含非终结符的规则,因此由于缺乏词法化指导,这种类型的规则有可能在解码中被严重滥用。在本文中,受广泛应用于单语分析的词汇化PCFG的启发,我们提出将基于同步树替换语法的STSG (synchronous tree substitution grammar)语法翻译模型升级为双语词汇化的STSG。以字符串到树的翻译模型为例,我们提出了生成模型和判别模型,将词汇化的STSG整合到翻译模型中。汉英翻译的小尺度和大尺度实验均表明,词典化STSG在译码过程中提供了优越的规则选择,显著提高了翻译质量。
{"title":"Syntax-Based Translation With Bilingually Lexicalized Synchronous Tree Substitution Grammars","authors":"Jiajun Zhang, Feifei Zhai, Chengqing Zong","doi":"10.1109/TASL.2013.2255283","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255283","url":null,"abstract":"Syntax-based models can significantly improve the translation performance due to their grammatical modeling on one or both language side(s). However, the translation rules such as the non-lexical rule “ VP→(x0x1,VP:x1PP:x0)” in string-to-tree models do not consider any lexicalized information on the source or target side. The rule is so generalized that any subtree rooted at VP can substitute for the nonterminal VP:x1. Because rules containing nonterminals are frequently used when generating the target-side tree structures, there is a risk that rules of this type will potentially be severely misused in decoding due to a lack of lexicalization guidance. In this article, inspired by lexicalized PCFG, which is widely used in monolingual parsing, we propose to upgrade the STSG (synchronous tree substitution grammars)-based syntax translation model with bilingually lexicalized STSG. Using the string-to-tree translation model as a case study, we present generative and discriminative models to integrate lexicalized STSG into the translation model. Both small- and large-scale experiments on Chinese-to-English translation demonstrate that the proposed lexicalized STSG can provide superior rule selection in decoding and substantially improve the translation quality.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1586-1597"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Class of Algorithms for Time-Frequency Multiplier Estimation 一类时频乘子估计算法
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255274
Anaïk Olivero, B. Torrésani, R. Kronland-Martinet
We propose here a new approach together with a corresponding class of algorithms for offline estimation of linear operators mapping input to output signals. The operators are modeled as multipliers, i.e., linear and diagonal operator in a frame or Bessel representation of signals (like Gabor, wavelets ...) and characterized by a transfer function. The estimation problem is formulated as a regularized inverse problem, and solved using iterative algorithms, based on gradient descent schemes. Various estimation problems, which differ by a choice for the regularization function, are studied in the case of Gabor multipliers. The transfer function actually provides a meaningful interpretation of the differences between the two signals or signal classes under consideration, and examples are discussed. Furthermore, examples of signal transformations with such Gabor transfer functions are also given.
我们在此提出了一种新的方法以及相应的算法类来离线估计映射输入到输出信号的线性算子。这些算子被建模为乘数,即信号的帧或贝塞尔表示(如Gabor,小波…)中的线性和对角算子,并以传递函数为特征。估计问题被表述为一个正则化的逆问题,并使用基于梯度下降格式的迭代算法求解。在Gabor乘子的情况下,研究了各种各样的估计问题,这些问题因正则化函数的选择而不同。传递函数实际上为考虑的两个信号或信号类之间的差异提供了有意义的解释,并讨论了示例。此外,还给出了用这种Gabor传递函数进行信号变换的例子。
{"title":"A Class of Algorithms for Time-Frequency Multiplier Estimation","authors":"Anaïk Olivero, B. Torrésani, R. Kronland-Martinet","doi":"10.1109/TASL.2013.2255274","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255274","url":null,"abstract":"We propose here a new approach together with a corresponding class of algorithms for offline estimation of linear operators mapping input to output signals. The operators are modeled as multipliers, i.e., linear and diagonal operator in a frame or Bessel representation of signals (like Gabor, wavelets ...) and characterized by a transfer function. The estimation problem is formulated as a regularized inverse problem, and solved using iterative algorithms, based on gradient descent schemes. Various estimation problems, which differ by a choice for the regularization function, are studied in the case of Gabor multipliers. The transfer function actually provides a meaningful interpretation of the differences between the two signals or signal classes under consideration, and examples are discussed. Furthermore, examples of signal transformations with such Gabor transfer functions are also given.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1550-1559"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255274","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
The Spectral Nature of Maximum Likelihood Noise Compensated Linear Prediction 最大似然噪声补偿线性预测的频谱性质
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255277
L. Weruaga, L. Dimitrov
The effects of noise in autoregressive (AR) analysis (or linear prediction) and its compensation (NCAR) has been commonly carried out in the time domain under the least square (LS) criterion. This paper studies the adequacy of such an approach by means of a comparative analysis with selected frequency-based NCAR methods. In particular, the maximization of the spectral likelihood (ML) results in a proper optimization problem that is easy to solve and brings useful insights into the rationale of the NCAR problem. On the contrary, popular time-based NCAR methods are shown in the paper to be designed, in the ML context, around ill-conditioned criteria, requiring constraints to guarantee stable solutions. The statistical analysis on a realistic scenario as well as an experiment on speech enhancement complement this analysis.
噪声在自回归(AR)分析(或线性预测)及其补偿(NCAR)中的影响通常在时域内根据最小二乘(LS)准则进行。本文通过与选定的基于频率的NCAR方法的比较分析来研究这种方法的充分性。特别是,谱似然(ML)的最大化导致了一个易于解决的适当优化问题,并为NCAR问题的基本原理带来了有用的见解。相反,本文显示,在ML上下文中,流行的基于时间的NCAR方法是围绕病态标准设计的,需要约束来保证稳定的解决方案。对一个现实场景的统计分析和语音增强实验对这一分析进行了补充。
{"title":"The Spectral Nature of Maximum Likelihood Noise Compensated Linear Prediction","authors":"L. Weruaga, L. Dimitrov","doi":"10.1109/TASL.2013.2255277","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255277","url":null,"abstract":"The effects of noise in autoregressive (AR) analysis (or linear prediction) and its compensation (NCAR) has been commonly carried out in the time domain under the least square (LS) criterion. This paper studies the adequacy of such an approach by means of a comparative analysis with selected frequency-based NCAR methods. In particular, the maximization of the spectral likelihood (ML) results in a proper optimization problem that is easy to solve and brings useful insights into the rationale of the NCAR problem. On the contrary, popular time-based NCAR methods are shown in the paper to be designed, in the ML context, around ill-conditioned criteria, requiring constraints to guarantee stable solutions. The statistical analysis on a realistic scenario as well as an experiment on speech enhancement complement this analysis.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1760-1765"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies 基于复杂形状刚体传感器阵列的宽带DOA估计
Pub Date : 2013-08-01 DOI: 10.1109/TASL.2013.2255282
Dumidu S. Talagala, Wen Zhang, T. Abhayapala
Sensor arrays mounted on complex-shaped rigid bodies are a common feature in many practical broadband direction of arrival (DOA) estimation applications. The scattering and reflections caused by these rigid bodies introduce complexity and diversity in the frequency domain of the channel transfer function, which presents several challenges to existing broadband DOA estimators. This paper presents a novel high resolution broadband DOA estimation technique based on signal subspace decomposition. We describe how broadband signals can be decomposed into narrow subband components, and combined such that the frequency domain diversity is retained. The DOA estimation performance is compared with existing techniques using a uniform circular array and a sensor array on a hypothetical rigid body. An improvement in closely spaced source resolution of up to 6 dB is observed for the sensor array on the hypothetical rigid body, in comparison to the uniform circular array. The results suggest that frequency domain diversity, introduced by complex-shaped rigid bodies, can provide higher resolution and clearer separation of closely spaced broadband sound sources.
在许多实际的宽带到达方向估计应用中,传感器阵列安装在复杂形状的刚体上是一种常见的特征。这些刚体引起的散射和反射导致了信道传递函数频域的复杂性和多样性,这给现有的宽带DOA估计器带来了一些挑战。提出了一种基于信号子空间分解的高分辨率宽带DOA估计方法。我们描述了如何将宽带信号分解成窄子带分量,并将其组合以保留频域分集。比较了在假设刚体上使用均匀圆形阵列和传感器阵列的现有方法的DOA估计性能。与均匀圆形阵列相比,假设刚体上的传感器阵列的近间隔源分辨率提高了6db。结果表明,由复杂形状刚体引入的频域分集可以为近距离宽带声源提供更高的分辨率和更清晰的分离。
{"title":"Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies","authors":"Dumidu S. Talagala, Wen Zhang, T. Abhayapala","doi":"10.1109/TASL.2013.2255282","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255282","url":null,"abstract":"Sensor arrays mounted on complex-shaped rigid bodies are a common feature in many practical broadband direction of arrival (DOA) estimation applications. The scattering and reflections caused by these rigid bodies introduce complexity and diversity in the frequency domain of the channel transfer function, which presents several challenges to existing broadband DOA estimators. This paper presents a novel high resolution broadband DOA estimation technique based on signal subspace decomposition. We describe how broadband signals can be decomposed into narrow subband components, and combined such that the frequency domain diversity is retained. The DOA estimation performance is compared with existing techniques using a uniform circular array and a sensor array on a hypothetical rigid body. An improvement in closely spaced source resolution of up to 6 dB is observed for the sensor array on the hypothetical rigid body, in comparison to the uniform circular array. The results suggest that frequency domain diversity, introduced by complex-shaped rigid bodies, can provide higher resolution and clearer separation of closely spaced broadband sound sources.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1573-1585"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255282","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
IEEE Transactions on Audio Speech and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1