首页 > 最新文献

IEEE Transactions on Audio Speech and Language Processing最新文献

英文 中文
On the Time-Domain Widely Linear LCMV Filter for Noise Reduction With a Stereo System 用于立体系统降噪的时域宽线性LCMV滤波器
Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2248719
Jingdong Chen, J. Benesty
This paper deals with the problem of noise reduction in stereo sound systems where the objective is not only to reduce noise, but also to preserve the spatial information of both the desired speech and noise sources so that the listener can still localize the speech and noise sources by listening to the enhanced binaural outputs. To achieve this objective, we use the widely linear (WL) framework developed previously and convert the problem of binaural noise reduction into one of monaural filtering with complex signals. We then present a way to decompose both the complex speech and noise signal vectors into two orthogonal components: one correlated and the other uncorrelated with the corresponding current signal sample. With this decomposition, the problem of noise reduction with preservation of the spatial information of speech and noise sources is formulated as an optimization problem with two constraints: one on the desired speech and the other on the preservation of the noise signal. We then derive a WL linearly constrained minimum variance (LCMV) filter, which can take advantage of the statistics and noncircularity of the complex speech signal to achieve noise reduction. In contrast to the WL Wiener and minimum variance distortionless response (MVDR) filters developed previously that can only preserve the characteristics and spatial information of the desired sound source, this new WL LCMV filter has the potential to reduce noise while preserving the characteristics and spatial information of both the desired and noise sources at the same time. Experimental results are provided to justify the claimed merits of the proposed WL LCMV filter.
本文讨论了立体声系统中的降噪问题,其目标不仅是降低噪声,而且要保留所需语音和噪声源的空间信息,以便听者仍然可以通过收听增强的双耳输出来定位语音和噪声源。为了实现这一目标,我们使用了先前开发的广泛线性(WL)框架,并将双耳降噪问题转化为复杂信号的单耳滤波问题。然后,我们提出了一种将复杂语音和噪声信号向量分解为两个正交分量的方法:一个与相应的当前信号样本相关,另一个不相关。通过这种分解,将保留语音和噪声源空间信息的降噪问题表述为具有两个约束条件的优化问题:一个约束条件是期望语音,另一个约束条件是保留噪声信号。然后,我们推导了一个WL线性约束最小方差(LCMV)滤波器,它可以利用复杂语音信号的统计性和非圆性来实现降噪。与之前开发的WL Wiener和最小方差无失真响应(MVDR)滤波器只能保留期望声源的特征和空间信息相比,这种新的WL LCMV滤波器具有降低噪声的潜力,同时保留期望声源和噪声源的特征和空间信息。实验结果证明了所提出的WL LCMV滤波器的优点。
{"title":"On the Time-Domain Widely Linear LCMV Filter for Noise Reduction With a Stereo System","authors":"Jingdong Chen, J. Benesty","doi":"10.1109/TASL.2013.2248719","DOIUrl":"https://doi.org/10.1109/TASL.2013.2248719","url":null,"abstract":"This paper deals with the problem of noise reduction in stereo sound systems where the objective is not only to reduce noise, but also to preserve the spatial information of both the desired speech and noise sources so that the listener can still localize the speech and noise sources by listening to the enhanced binaural outputs. To achieve this objective, we use the widely linear (WL) framework developed previously and convert the problem of binaural noise reduction into one of monaural filtering with complex signals. We then present a way to decompose both the complex speech and noise signal vectors into two orthogonal components: one correlated and the other uncorrelated with the corresponding current signal sample. With this decomposition, the problem of noise reduction with preservation of the spatial information of speech and noise sources is formulated as an optimization problem with two constraints: one on the desired speech and the other on the preservation of the noise signal. We then derive a WL linearly constrained minimum variance (LCMV) filter, which can take advantage of the statistics and noncircularity of the complex speech signal to achieve noise reduction. In contrast to the WL Wiener and minimum variance distortionless response (MVDR) filters developed previously that can only preserve the characteristics and spatial information of the desired sound source, this new WL LCMV filter has the potential to reduce noise while preserving the characteristics and spatial information of both the desired and noise sources at the same time. Experimental results are provided to justify the claimed merits of the proposed WL LCMV filter.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248719","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Ranking Through Clustering: An Integrated Approach to Multi-Document Summarization 聚类排序:一种集成的多文档摘要方法
Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2253098
Xiaoyan Cai, Wenjie Li
Multi-document summarization aims to create a condensed summary while retaining the main characteristics of the original set of documents. Under such background, sentence ranking has hitherto been the issue of most concern. Since documents often cover a number of topic themes with each theme represented by a cluster of highly related sentences, sentence clustering has been explored in the literature in order to provide more informative summaries. For each topic theme, the rank of terms conditional on this topic theme should be very distinct, and quite different from the rank of terms in other topic themes. Existing cluster-based summarization approaches apply clustering and ranking in isolation, which leads to incomplete, or sometimes rather biased, analytical results. A newly emerged framework uses sentence clustering results to improve or refine the sentence ranking results. Under this framework, we propose a novel approach that directly generates clusters integrated with ranking in this paper. The basic idea of the approach is that ranking distribution of sentences in each cluster should be quite different from each other, which may serve as features of clusters and new clustering measures of sentences can be calculated accordingly. Meanwhile, better clustering results can achieve better ranking results. As a result, ranking and clustering by mutually and simultaneously updating each other so that the performance of both can be improved. The effectiveness of the proposed approach is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004-2007 datasets.
多文档摘要旨在创建一个浓缩的摘要,同时保留原始文档集的主要特征。在这样的背景下,句子排序一直是人们最为关注的问题。由于文档通常涵盖多个主题,每个主题由一组高度相关的句子表示,为了提供更多信息的摘要,文献中对句子聚类进行了探索。对于每一个主题主题,这个主题主题的条件下的词的排名应该非常明显,并且与其他主题主题中的词的排名有很大的不同。现有的基于聚类的摘要方法将聚类和排序分开应用,这导致分析结果不完整,有时甚至有偏见。一个新出现的框架使用句子聚类结果来改进或精炼句子排序结果。在此框架下,本文提出了一种直接生成与排名相结合的聚类的新方法。该方法的基本思想是,每个聚类中句子的排序分布应该有很大的不同,这可以作为聚类的特征,从而计算出新的句子聚类度量。同时,更好的聚类结果可以获得更好的排序结果。因此,排序和聚类通过相互并同步更新,从而提高了两者的性能。通过对DUC 2004-2007数据集的聚类质量分析和汇总评价,验证了该方法的有效性。
{"title":"Ranking Through Clustering: An Integrated Approach to Multi-Document Summarization","authors":"Xiaoyan Cai, Wenjie Li","doi":"10.1109/TASL.2013.2253098","DOIUrl":"https://doi.org/10.1109/TASL.2013.2253098","url":null,"abstract":"Multi-document summarization aims to create a condensed summary while retaining the main characteristics of the original set of documents. Under such background, sentence ranking has hitherto been the issue of most concern. Since documents often cover a number of topic themes with each theme represented by a cluster of highly related sentences, sentence clustering has been explored in the literature in order to provide more informative summaries. For each topic theme, the rank of terms conditional on this topic theme should be very distinct, and quite different from the rank of terms in other topic themes. Existing cluster-based summarization approaches apply clustering and ranking in isolation, which leads to incomplete, or sometimes rather biased, analytical results. A newly emerged framework uses sentence clustering results to improve or refine the sentence ranking results. Under this framework, we propose a novel approach that directly generates clusters integrated with ranking in this paper. The basic idea of the approach is that ranking distribution of sentences in each cluster should be quite different from each other, which may serve as features of clusters and new clustering measures of sentences can be calculated accordingly. Meanwhile, better clustering results can achieve better ranking results. As a result, ranking and clustering by mutually and simultaneously updating each other so that the performance of both can be improved. The effectiveness of the proposed approach is demonstrated by both the cluster quality analysis and the summarization evaluation conducted on the DUC 2004-2007 datasets.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2253098","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Performance of the SDW-MWF With Randomly Located Microphones in a Reverberant Enclosure 混响罩中随机位置麦克风的SDW-MWF性能
Pub Date : 2013-07-01 DOI: 10.1109/TASL.2013.2255280
S. M. Golan, S. Gannot, I. Cohen
Beamforming with wireless acoustic sensor networks (WASNs) has recently drawn the attention of the research community. As the number of microphones grows it is difficult, and in some applications impossible, to determine their layout beforehand. A common practice in analyzing the expected performance is to utilize statistical considerations. In the current contribution, we consider applying the speech distortion weighted multi-channel Wiener filter (SDW-MWF) to enhance a desired source propagating in a reverberant enclosure where the microphones are randomly located with a uniform distribution. Two noise fields are considered, namely, multiple coherent interference signals and a diffuse sound field. Utilizing the statistics of the acoustic transfer function (ATF), we derive a statistical model for two important criteria of the beamformer (BF): the signal to interference ratio (SIR), and the white noise gain. Moreover, we propose reliability functions, which determine the probability of the SIR and white noise gain to exceed a predefined level. We verify the proposed model with an extensive simulative study.
无线声传感器网络波束形成技术近年来引起了研究界的广泛关注。随着麦克风数量的增加,预先确定它们的布局变得很困难,在某些应用中是不可能的。分析预期性能的一个常见做法是利用统计因素。在当前的贡献中,我们考虑应用语音失真加权多通道维纳滤波器(SDW-MWF)来增强在麦克风随机分布均匀的混响罩中传播的期望源。考虑了两个噪声场,即多个相干干扰信号和一个漫射声场。利用声传递函数(ATF)的统计特性,推导了波束形成器(BF)的两个重要指标的统计模型:信干扰比(SIR)和白噪声增益。此外,我们提出了可靠性函数,它确定SIR和白噪声增益超过预定义水平的概率。我们通过广泛的模拟研究验证了所提出的模型。
{"title":"Performance of the SDW-MWF With Randomly Located Microphones in a Reverberant Enclosure","authors":"S. M. Golan, S. Gannot, I. Cohen","doi":"10.1109/TASL.2013.2255280","DOIUrl":"https://doi.org/10.1109/TASL.2013.2255280","url":null,"abstract":"Beamforming with wireless acoustic sensor networks (WASNs) has recently drawn the attention of the research community. As the number of microphones grows it is difficult, and in some applications impossible, to determine their layout beforehand. A common practice in analyzing the expected performance is to utilize statistical considerations. In the current contribution, we consider applying the speech distortion weighted multi-channel Wiener filter (SDW-MWF) to enhance a desired source propagating in a reverberant enclosure where the microphones are randomly located with a uniform distribution. Two noise fields are considered, namely, multiple coherent interference signals and a diffuse sound field. Utilizing the statistics of the acoustic transfer function (ATF), we derive a statistical model for two important criteria of the beamformer (BF): the signal to interference ratio (SIR), and the white noise gain. Moreover, we propose reliability functions, which determine the probability of the SIR and white noise gain to exceed a predefined level. We verify the proposed model with an extensive simulative study.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2255280","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Noise Estimation Using a Constrained Sequential Hidden Markov Model in the Log-Spectral Domain 基于对数谱域约束序列隐马尔可夫模型的噪声估计
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2245648
D. Ying, Yonghong Yan
The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. In theory, hidden Markov models (HMMs) are superior to this technique in modeling temporal correlation. HMMs can model a time sequence of presence/absence of speech signal as a dynamic process of the transition between speech and non-speech states. Moreover, a number of methods, such as maximum likelihood, are available for optimal estimation of HMM parameters. This paper presents a constrained sequential HMM for modeling the log-power sequence on each frequency band. The emission probability of each HMM state is represented by a Gaussian model. The Gaussian mean of the non-speech state is considered as the optimal estimate of noise logarithmic power. The HMM parameter set is sequentially estimated from one frame to another on the basis of maximum likelihood. The proposed method is compared with well-established algorithms through various experiments. Our method delivers more accurate results and does not rely on the assumption of the “non-speech signal onset” as do most algorithms.
语音存在/缺失的时间相关性在噪声估计中有着广泛的应用。利用时间相关性最常用的技术是使用时间递归滤波器平滑噪声频谱,其中遗忘因子由语音存在概率控制。然而,这种技术并没有统一成一个理论框架,使最佳的噪声估计。从理论上讲,隐马尔可夫模型(hmm)在时间相关性建模方面优于该技术。hmm可以将语音信号存在/不存在的时间序列建模为语音和非语音状态之间转换的动态过程。此外,许多方法,如极大似然,可用于HMM参数的最优估计。本文提出了一种约束序列隐马尔可夫模型,用于对各频带上的对数功率序列进行建模。每个HMM状态的发射概率用高斯模型表示。将非语音状态的高斯均值作为噪声对数功率的最优估计。HMM参数集在极大似然的基础上从一帧到另一帧依次估计。通过各种实验,将该方法与已有的算法进行了比较。我们的方法提供了更准确的结果,并且不像大多数算法那样依赖于“非语音信号开始”的假设。
{"title":"Noise Estimation Using a Constrained Sequential Hidden Markov Model in the Log-Spectral Domain","authors":"D. Ying, Yonghong Yan","doi":"10.1109/TASL.2013.2245648","DOIUrl":"https://doi.org/10.1109/TASL.2013.2245648","url":null,"abstract":"The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. In theory, hidden Markov models (HMMs) are superior to this technique in modeling temporal correlation. HMMs can model a time sequence of presence/absence of speech signal as a dynamic process of the transition between speech and non-speech states. Moreover, a number of methods, such as maximum likelihood, are available for optimal estimation of HMM parameters. This paper presents a constrained sequential HMM for modeling the log-power sequence on each frequency band. The emission probability of each HMM state is represented by a Gaussian model. The Gaussian mean of the non-speech state is considered as the optimal estimate of noise logarithmic power. The HMM parameter set is sequentially estimated from one frame to another on the basis of maximum likelihood. The proposed method is compared with well-established algorithms through various experiments. Our method delivers more accurate results and does not rely on the assumption of the “non-speech signal onset” as do most algorithms.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2245648","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62887949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Simple Prior for Audio Signals 音频信号的简单先验
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2245652
I. Bayram, M. Kamasak
We propose a simple prior for restoration problems involving oscillatory signals. The prior makes use of an underlying analytic frame decomposition with narrow subbands. Other than this, the prior does not have any other parameters, which makes it simple to use and apply. We demonstrate the utility of the proposed prior through some real audio restoration experiments.
我们提出了一个简单的先验恢复问题涉及振荡信号。先验利用具有窄子带的底层分析框架分解。除此之外,先验没有任何其他参数,这使得它易于使用和应用。我们通过一些真实音频恢复实验证明了所提出的先验算法的实用性。
{"title":"A Simple Prior for Audio Signals","authors":"I. Bayram, M. Kamasak","doi":"10.1109/TASL.2013.2245652","DOIUrl":"https://doi.org/10.1109/TASL.2013.2245652","url":null,"abstract":"We propose a simple prior for restoration problems involving oscillatory signals. The prior makes use of an underlying analytic frame decomposition with narrow subbands. Other than this, the prior does not have any other parameters, which makes it simple to use and apply. We demonstrate the utility of the proposed prior through some real audio restoration experiments.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2245652","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62887955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Learning Phrase Patterns for Text Classification 学习用于文本分类的短语模式
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2245651
Bin Zhang, Alex Marin, Brian Hutchinson, Mari Ostendorf
This paper introduces methods to discriminatively learn phrase patterns for use as features in text classification. An efficient solution is described using a recursive algorithm with a mutual information selection criterion. The algorithm automatically determines when word classes are useful in specific locations of a phrase pattern, allowing for variable specificity depending on the amount of labeled data available. Experiments are carried out on three text classification tasks in both English and Chinese, resulting in improved performance when adding the phrase patterns to the existing n-gram features.
本文介绍了判别式学习短语模式的方法,并将其作为文本分类的特征。用一种具有互信息选择准则的递归算法描述了一个有效的解。该算法自动确定词类何时在短语模式的特定位置有用,允许根据可用标记数据的数量进行可变的特异性。在英汉两种文本分类任务上进行了实验,在已有的n-gram特征上加入短语模式,提高了分类性能。
{"title":"Learning Phrase Patterns for Text Classification","authors":"Bin Zhang, Alex Marin, Brian Hutchinson, Mari Ostendorf","doi":"10.1109/TASL.2013.2245651","DOIUrl":"https://doi.org/10.1109/TASL.2013.2245651","url":null,"abstract":"This paper introduces methods to discriminatively learn phrase patterns for use as features in text classification. An efficient solution is described using a recursive algorithm with a mutual information selection criterion. The algorithm automatically determines when word classes are useful in specific locations of a phrase pattern, allowing for variable specificity depending on the amount of labeled data available. Experiments are carried out on three text classification tasks in both English and Chinese, resulting in improved performance when adding the phrase patterns to the existing n-gram features.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2245651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62887839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding 基于驱动解码的语音自动识别系统动态组合
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2248716
B. Lecouteux, G. Linarès, Y. Estève, G. Gravier
Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances.
组合自动语音识别(ASR)系统通常依赖于输出的后验合并或声学交叉适应。在本文中,我们提出了一个集成的方法,其中二次系统的输出集成在初级系统的搜索算法。在这种驱动解码算法(DDA)中,次要系统被视为观测源,应该通过主要搜索算法对其进行评估和组合。DDA在ESTER I语料库的一个子集上进行评估,该语料库由4小时的法语广播新闻组成。结果表明,DDA显著优于基于投票的方法:与最佳的单一系统相比,我们获得了14.5%的相对单词错误率提高,而与ROVER组合相比,错误率提高了6.7%。对DDA的深入分析表明,它能够提高鲁棒性(在不利条件下收益更大),并且对搜索算法的依赖性相对较低。将DDA应用于基于波束搜索的解码器和基于波束搜索的解码器可以获得相似的性能。
{"title":"Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding","authors":"B. Lecouteux, G. Linarès, Y. Estève, G. Gravier","doi":"10.1109/TASL.2013.2248716","DOIUrl":"https://doi.org/10.1109/TASL.2013.2248716","url":null,"abstract":"Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248716","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Lexical Prefix Tree and WFST: A Comparison of Two Dynamic Search Concepts for LVCSR 词汇前缀树与WFST: LVCSR中两种动态搜索概念的比较
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2248723
David Rybach, H. Ney, R. Schlüter
Dynamic network decoders have the advantage of significantly lower memory consumption compared to static network decoders, especially when huge vocabularies and complex language models are required. This paper compares the properties of two well-known search strategies for dynamic network decoding, namely history conditioned lexical tree search and weighted finite-state transducer-based search using on-the-fly transducer composition. The two search strategies share many common principles like the use of dynamic programming, beam search, and many more. We point out the similarities of both approaches and investigate the implications of their differing features, both formally and experimentally, with a focus on implementation independent properties. Therefore, experimental results are obtained with a single decoder by representing the history conditioned lexical tree search strategy in the transducer framework. The properties analyzed cover structure and size of the search space, differences in hypotheses recombination, language model look-ahead techniques, and lattice generation.
与静态网络解码器相比,动态网络解码器具有显著降低内存消耗的优势,特别是在需要巨大的词汇表和复杂的语言模型时。本文比较了两种著名的动态网络解码搜索策略的特性,即历史条件词汇树搜索和基于加权有限状态换能器的动态换能器组合搜索。这两种搜索策略有许多共同的原则,比如使用动态规划、束搜索等等。我们指出了这两种方法的相似之处,并在形式上和实验上研究了它们不同特征的含义,重点是实现独立的属性。因此,通过在换能器框架中表示历史条件词法树搜索策略,可以在单个解码器上获得实验结果。分析的属性包括搜索空间的结构和大小、假设重组的差异、语言模型前瞻性技术和格生成。
{"title":"Lexical Prefix Tree and WFST: A Comparison of Two Dynamic Search Concepts for LVCSR","authors":"David Rybach, H. Ney, R. Schlüter","doi":"10.1109/TASL.2013.2248723","DOIUrl":"https://doi.org/10.1109/TASL.2013.2248723","url":null,"abstract":"Dynamic network decoders have the advantage of significantly lower memory consumption compared to static network decoders, especially when huge vocabularies and complex language models are required. This paper compares the properties of two well-known search strategies for dynamic network decoding, namely history conditioned lexical tree search and weighted finite-state transducer-based search using on-the-fly transducer composition. The two search strategies share many common principles like the use of dynamic programming, beam search, and many more. We point out the similarities of both approaches and investigate the implications of their differing features, both formally and experimentally, with a focus on implementation independent properties. Therefore, experimental results are obtained with a single decoder by representing the history conditioned lexical tree search strategy in the transducer framework. The properties analyzed cover structure and size of the search space, differences in hypotheses recombination, language model look-ahead techniques, and lattice generation.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248723","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Position-Dependent Crosstalk Cancellation Using Space Partitioning 使用空间分割的位置相关串扰消除
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2248713
Ki-Seung Lee
The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to each cell. The listening space partitions and the corresponding crosstalk cancellation filters were constructed by maximizing the average channel separation ratio (CSR). Since the proposed method employed cell-based crosstalk cancellation, estimation of the exact position of the listener was not necessary. Instead, it was only necessary to determine the cell in which the listener was located. This was achieved by simply employing an artificial neural network (ANN) where the time delay to each pair of microphones was used as the ANN input and the ANN output corresponded to the index of cells. The experimental results showed that more than 95% of the experimental listening space had a CSR ≥ 10 dB when the number of clusters exceeded 12. Under these conditions, the correlation between the true directions of the virtual sound sources and the directions recognized by the subjects was greater than 0.9.
目前的研究测试了一种新的立体声回放系统,该系统可以有效地消除任意收听位置的串音信号。这种重放系统通过集成听者位置跟踪技术和串声消除技术来实现。将整个监听空间划分为多个互不重叠的单元,并为每个单元分配一个串扰消除滤波器。通过最大化平均信道分离比(CSR)来构造监听空间分区和相应的串扰消除滤波器。由于所提出的方法采用基于小区的串扰对消,因此不需要估计听者的确切位置。相反,只需要确定侦听器所在的单元。这是通过简单地采用人工神经网络(ANN)来实现的,其中每对麦克风的时间延迟作为人工神经网络输入,人工神经网络输出对应于细胞的指数。实验结果表明,当簇数超过12个时,95%以上的实验聆听空间的CSR≥10 dB。在此条件下,虚拟声源真实方向与被试识别方向的相关性大于0.9。
{"title":"Position-Dependent Crosstalk Cancellation Using Space Partitioning","authors":"Ki-Seung Lee","doi":"10.1109/TASL.2013.2248713","DOIUrl":"https://doi.org/10.1109/TASL.2013.2248713","url":null,"abstract":"The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to each cell. The listening space partitions and the corresponding crosstalk cancellation filters were constructed by maximizing the average channel separation ratio (CSR). Since the proposed method employed cell-based crosstalk cancellation, estimation of the exact position of the listener was not necessary. Instead, it was only necessary to determine the cell in which the listener was located. This was achieved by simply employing an artificial neural network (ANN) where the time delay to each pair of microphones was used as the ANN input and the ANN output corresponded to the index of cells. The experimental results showed that more than 95% of the experimental listening space had a CSR ≥ 10 dB when the number of clusters exceeded 12. Under these conditions, the correlation between the true directions of the virtual sound sources and the directions recognized by the subjects was greater than 0.9.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248713","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Underdetermined Sound Source Separation Using Power Spectrum Density Estimated by Combination of Directivity Gain 利用指向性增益组合估计功率谱密度的欠定声源分离
Pub Date : 2013-06-01 DOI: 10.1109/TASL.2013.2248715
Yusuke Hioka, K. Furuya, Kazunori Kobayashi, K. Niwa, Y. Haneda
A method for separating underdetermined sound sources based on a novel power spectral density (PSD) estimation is proposed. The method enables up to M(M-1)+1 sources to be separated when we use a microphone array of M sensors and a Wiener post-filter calculated by the estimated PSDs. The PSD of a beamformer's output is modelled by a mixture of source PSDs multiplied by the beamformer's directivity gain in the particular angle where each source is located. Based on this model, the PSD of each sound source is estimated from the PSD of multiple fixed beamformers' outputs using the difference in the combination of directivity gains. Simulation results proved that the proposed method effectively separated up to M(M-1)+1 sound sources if the fixed beamformers were appropriately selected. Experiments were also conducted in a reverberant chamber to ensure the proposed method was also effective in practical use.
提出了一种基于功率谱密度(PSD)估计的欠定声源分离方法。当我们使用由M个传感器组成的麦克风阵列和由估计的psd计算的维纳后滤波器时,该方法可以分离多达M(M-1)+1个源。波束形成器输出的PSD由源PSD的混合乘以波束形成器在每个源所在的特定角度的指向性增益来建模。基于该模型,利用指向性增益组合的差异,从多个固定波束形成器输出的PSD估计每个声源的PSD。仿真结果表明,只要选择合适的固定波束形成器,该方法可以有效地分离至多M(M-1)+1个声源。在混响室内进行了实验,验证了该方法在实际应用中的有效性。
{"title":"Underdetermined Sound Source Separation Using Power Spectrum Density Estimated by Combination of Directivity Gain","authors":"Yusuke Hioka, K. Furuya, Kazunori Kobayashi, K. Niwa, Y. Haneda","doi":"10.1109/TASL.2013.2248715","DOIUrl":"https://doi.org/10.1109/TASL.2013.2248715","url":null,"abstract":"A method for separating underdetermined sound sources based on a novel power spectral density (PSD) estimation is proposed. The method enables up to M(M-1)+1 sources to be separated when we use a microphone array of M sensors and a Wiener post-filter calculated by the estimated PSDs. The PSD of a beamformer's output is modelled by a mixture of source PSDs multiplied by the beamformer's directivity gain in the particular angle where each source is located. Based on this model, the PSD of each sound source is estimated from the PSD of multiple fixed beamformers' outputs using the difference in the combination of directivity gains. Simulation results proved that the proposed method effectively separated up to M(M-1)+1 sound sources if the fixed beamformers were appropriately selected. Experiments were also conducted in a reverberant chamber to ensure the proposed method was also effective in practical use.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248715","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62888411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
期刊
IEEE Transactions on Audio Speech and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1