首页 > 最新文献

IEEE Trans. Speech Audio Process.最新文献

英文 中文
Bayesian learning of speech duration models 语音持续时间模型的贝叶斯学习
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818114
Jen-Tzung Chien, Chih-Hsien Huang
This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMMs for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously. In the experiments, the proposed adaptive duration model improves the speech recognition performance of Mandarin broadcast news and noisy connected digits. The batch and sequential learning are respectively investigated for MAP and QB duration models.
提出了基于隐马尔可夫模型(HMM)的语音识别的贝叶斯语音持续时间建模和学习方法。研究了基于准贝叶斯估计的HMM状态持续时间序列学习。该模型对非平稳语速和噪声条件具有较强的鲁棒性。在本研究中,研究了高斯分布、泊松分布和伽马分布来表征持续时间模型。提出了伽玛持续时间模型的最大后验估计方法。为了利用序列学习,我们采用了结合gamma先验密度的泊松持续时间模型,该模型属于共轭先验族。当连续观测自适应数据时,产生的伽玛后验密度具有双重优势。一是确定最佳QB持续时间参数,并将其合并到hmm中进行语音识别。二是建立序列学习的先验统计量更新机制。采用EM算法实现QB参数估计。HMM整体参数的自适应可以同时进行。在实验中,提出的自适应时长模型提高了普通话广播新闻和噪声连接数字的语音识别性能。分别研究了MAP和QB持续时间模型的批学习和顺序学习。
{"title":"Bayesian learning of speech duration models","authors":"Jen-Tzung Chien, Chih-Hsien Huang","doi":"10.1109/TSA.2003.818114","DOIUrl":"https://doi.org/10.1109/TSA.2003.818114","url":null,"abstract":"This paper presents the Bayesian speech duration modeling and learning for hidden Markov model (HMM) based speech recognition. We focus on the sequential learning of HMM state duration using quasi-Bayes (QB) estimate. The adapted duration models are robust to nonstationary speaking rates and noise conditions. In this study, the Gaussian, Poisson, and gamma distributions are investigated to characterize the duration models. The maximum a posteriori (MAP) estimate of gamma duration model is developed. To exploit the sequential learning, we adopt the Poisson duration model incorporated with gamma prior density, which belongs to the conjugate prior family. When the adaptation data are sequentially observed, the gamma posterior density is produced with twofold advantages. One is to determine the optimal QB duration parameter, which can be merged in HMMs for speech recognition. The other one is to build the updating mechanism of gamma prior statistics for sequential learning. EM algorithm is applied to fulfill QB parameter estimation. The adaptation of overall HMM parameters can be performed simultaneously. In the experiments, the proposed adaptive duration model improves the speech recognition performance of Mandarin broadcast news and noisy connected digits. The batch and sequential learning are respectively investigated for MAP and QB duration models.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"75 1","pages":"558-567"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72710935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model 基于约束非线性状态空间模型的会话语音识别的高效解码策略
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818075
Jeff Z. Ma, L. Deng
In this paper, we present two efficient strategies for likelihood computation and decoding in a continuous speech recognizer using an underlying nonlinear state-space dynamic model for the hidden speech dynamics. The state-space model has been specially constructed so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. Two specific decoding algorithms, based on optimal state-sequence estimation for the nonlinear state-space model, are derived, implemented, and evaluated. They successfully overcome the exponential growth in the original search paths by using the path-merging approaches derived from Bayes' rule. We have tested and compared the two algorithms using the speech data from the Switchboard corpus, confirming their effectiveness. Conversational speech recognition experiments using the Switchboard corpus further demonstrated that the use of the new decoding strategies is capable of reducing the recognizer's word error rate compared with two baseline recognizers, including the HMM system and the nonlinear state-space model using the HMM-produced phonetic boundaries, under identical test conditions.
在本文中,我们提出了在连续语音识别器中使用潜在的非线性状态空间动态模型进行似然计算和解码的两种有效策略。状态空间模型是专门构建的,适用于语音缩减较多的会话式或随意式语音。基于非线性状态空间模型的最优状态序列估计,推导、实现和评估了两种特定的解码算法。他们利用贝叶斯规则衍生的路径合并方法,成功地克服了原始搜索路径的指数增长。我们使用总机语料库中的语音数据对两种算法进行了测试和比较,证实了它们的有效性。使用交换机语料库的会话语音识别实验进一步证明,在相同的测试条件下,与HMM系统和使用HMM产生的语音边界的非线性状态空间模型两种基线识别器相比,使用新的解码策略能够降低识别器的单词错误率。
{"title":"Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model","authors":"Jeff Z. Ma, L. Deng","doi":"10.1109/TSA.2003.818075","DOIUrl":"https://doi.org/10.1109/TSA.2003.818075","url":null,"abstract":"In this paper, we present two efficient strategies for likelihood computation and decoding in a continuous speech recognizer using an underlying nonlinear state-space dynamic model for the hidden speech dynamics. The state-space model has been specially constructed so as to be suitable for the conversational or casual style of speech where phonetic reduction abounds. Two specific decoding algorithms, based on optimal state-sequence estimation for the nonlinear state-space model, are derived, implemented, and evaluated. They successfully overcome the exponential growth in the original search paths by using the path-merging approaches derived from Bayes' rule. We have tested and compared the two algorithms using the speech data from the Switchboard corpus, confirming their effectiveness. Conversational speech recognition experiments using the Switchboard corpus further demonstrated that the use of the new decoding strategies is capable of reducing the recognizer's word error rate compared with two baseline recognizers, including the HMM system and the nonlinear state-space model using the HMM-produced phonetic boundaries, under identical test conditions.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"376 1","pages":"590-602"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74596321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Binaural cue coding-Part I: psychoacoustic fundamentals and design principles 双耳线索编码。第一部分:心理声学基础和设计原则
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818109
F. Baumgarte, C. Faller
Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.
双耳线索编码(BCC)是一种基于一个下混音频通道和BCC边信息的多通道空间渲染方法。BCC侧信息具有较低的数据率,它来源于多通道编码器输入信号。BCC的一个自然应用是多通道音频数据速率降低,因为只需要传输一个下行混合音频通道。另一种BCC方案用于独立源信号的有效联合传输,支持解码器的灵活空间渲染。本文(第一部分)讨论了BCC所利用的最相关的双耳感知现象。在此基础上,提出了一种基于心理声学的BCC分析仪和合成器设计方法。这导致了一个基于耳蜗滤波器组的立体声音频信号分析和合成的参考实现。基于FFT的BCC合成器实现是一种低复杂度的替代方案。对这些实现的主观音频质量评估表明,BCC对关键语音和音频材料的鲁棒性。此外,结果表明,当使用低复杂度的基于fft的合成器时,参考合成器的性能不会受到显著损害。第二部分概述了多声道音频的BCC分析和合成,并提出了完整的BCC方案,包括量化和编码。第二部分还描述了在解码器上具有灵活呈现能力的备选BCC方案,并提出了两种BCC方案的几种应用。
{"title":"Binaural cue coding-Part I: psychoacoustic fundamentals and design principles","authors":"F. Baumgarte, C. Faller","doi":"10.1109/TSA.2003.818109","DOIUrl":"https://doi.org/10.1109/TSA.2003.818109","url":null,"abstract":"Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"90 1","pages":"509-519"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87478222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 175
Analysis of two-channel generalized sidelobe canceller (GSC) with post-filtering 后滤波双通道广义旁瓣对消分析
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818105
I. Cohen
In this paper, we analyze a two-channel generalized sidelobe canceller with post-filtering in nonstationary noise environments. The post-filtering includes detection of transients at the beamformer output and reference signal, a comparison of their transient power, estimation of the signal presence probability, estimation of the noise spectrum, and spectral enhancement for minimizing the mean-square error of the log-spectra. Transients are detected based on a measure of their local nonstationarity, and classified as desired or interfering based on the transient beam-to-reference ratio. We introduce a transient discrimination quality measure, which quantifies the beamformer's capability to recognize noise transients as distinct from signal transients. Evaluating this measure in various noise fields shows that desired and interfering transients can generally be differentiated within a wide range of frequencies. To further improve the transient noise reduction at low and high frequencies in case the signal is wideband, we estimate for each time frame a global likelihood of signal presence. The global likelihood is associated with the transient beam-to-reference ratios in frequencies, where the transient discrimination quality is high. Experimental results demonstrate the usefulness of the proposed approach in various car environments.
本文分析了一种非平稳噪声环境下带后滤波的双通道广义旁瓣对消器。后滤波包括波束形成器输出和参考信号的瞬态检测,它们的瞬态功率的比较,信号存在概率的估计,噪声谱的估计,以及谱增强以最小化对数谱的均方误差。瞬态检测是基于其局部非平稳性的测量,并根据瞬态光束与参考比分类为期望或干扰。我们引入了一种瞬态识别质量测量,它量化了波束形成器识别噪声瞬态和识别信号瞬态的能力。在各种噪声场中评估这一测量表明,通常可以在很宽的频率范围内区分期望的和干扰的瞬态。为了进一步改善低频和高频的瞬态降噪,在信号是宽带的情况下,我们估计每个时间框架的信号存在的全局可能性。全局似然与瞬态波束与参考比在频率上相关,其中瞬态识别质量很高。实验结果证明了该方法在各种汽车环境中的有效性。
{"title":"Analysis of two-channel generalized sidelobe canceller (GSC) with post-filtering","authors":"I. Cohen","doi":"10.1109/TSA.2003.818105","DOIUrl":"https://doi.org/10.1109/TSA.2003.818105","url":null,"abstract":"In this paper, we analyze a two-channel generalized sidelobe canceller with post-filtering in nonstationary noise environments. The post-filtering includes detection of transients at the beamformer output and reference signal, a comparison of their transient power, estimation of the signal presence probability, estimation of the noise spectrum, and spectral enhancement for minimizing the mean-square error of the log-spectra. Transients are detected based on a measure of their local nonstationarity, and classified as desired or interfering based on the transient beam-to-reference ratio. We introduce a transient discrimination quality measure, which quantifies the beamformer's capability to recognize noise transients as distinct from signal transients. Evaluating this measure in various noise fields shows that desired and interfering transients can generally be differentiated within a wide range of frequencies. To further improve the transient noise reduction at low and high frequencies in case the signal is wideband, we estimate for each time frame a global likelihood of signal presence. The global likelihood is associated with the transient beam-to-reference ratios in frequencies, where the transient discrimination quality is high. Experimental results demonstrate the usefulness of the proposed approach in various car environments.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"51 1","pages":"684-699"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86447726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
On the use of linguistic consistency in systems for human-computer dialogues 语言一致性在人机对话系统中的应用
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818318
Y. Estève, C. Raymond, R. Mori, D. Janiszek
This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanation-based learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.
本文介绍了基于对不同语言模型(LMs)的结果进行推理的新识别策略。策略的建立是基于这样的假设:不同模型得到的结果之间的一致性会导致假设句子具有不同的单词错误率(WER),并可能与其他lm进一步处理。利用潜在语义分析和三角图类比的思想,通过数据增强构建新的LMs。情境是通过表达不同lm产生的识别结果之间的一致性以及假设句子中未观察到的三元组的数量来定义的。基于句子后验概率值,比较了使用观察到的三元组或其对应的类三元组与情境的诊断能力。为了避免或纠正被识别句子因句法不一致而产生的错误,在一定条件下引入并使用基于解释的学习获得的自动机。引入语义分类树来提供表达长距离句法连贯约束的句型。在法国电信研发提供的对话语料库上的结果表明,在1422个句子的测试集上,从21.87%的WER开始,可以将句子细分为三个以自动识别情景为特征的集合。第一种方法的覆盖率为68%,加权加权系数为7.44%。第二种有各种类型的句子,在20%左右。第三份包含13%的应该被拒绝的句子,WER约为49%。第二组描述了对话口译员应该特别小心处理的句子,并有可能要求用户确认。
{"title":"On the use of linguistic consistency in systems for human-computer dialogues","authors":"Y. Estève, C. Raymond, R. Mori, D. Janiszek","doi":"10.1109/TSA.2003.818318","DOIUrl":"https://doi.org/10.1109/TSA.2003.818318","url":null,"abstract":"This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanation-based learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"6 1","pages":"746-756"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76143608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Lossless compression of digital audio using cascaded RLS-LMS prediction 使用级联RLS-LMS预测的数字音频无损压缩
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818111
R. Yu, C. Ko
This paper proposes a cascaded RLS-LMS predictor for lossless audio coding. In this proposed predictor, a high-order LMS predictor is employed to model the ample tonal and harmonic components of the audio signal for optimal prediction gain performance. To solve the slow convergence problem of the LMS algorithm with colored inputs, a low-order RLS predictor is cascaded prior to the LMS predictor to remove the spectral tilt of the audio signal. This cascaded RLS-LMS structure effectively mitigates the slow convergence problem of the LMS algorithm and provides superior prediction gain performance compared with the conventional LMS predictor, resulting in a better overall compression performance.
提出了一种用于无损音频编码的级联RLS-LMS预测器。在该预测器中,采用高阶LMS预测器对音频信号的大量音调和谐波成分进行建模,以获得最佳的预测增益性能。为了解决彩色输入时LMS算法收敛缓慢的问题,在LMS预测器之前级联了一个低阶RLS预测器,以消除音频信号的频谱倾斜。这种级联的RLS-LMS结构有效地缓解了LMS算法收敛缓慢的问题,与传统的LMS预测器相比,提供了更好的预测增益性能,从而获得了更好的整体压缩性能。
{"title":"Lossless compression of digital audio using cascaded RLS-LMS prediction","authors":"R. Yu, C. Ko","doi":"10.1109/TSA.2003.818111","DOIUrl":"https://doi.org/10.1109/TSA.2003.818111","url":null,"abstract":"This paper proposes a cascaded RLS-LMS predictor for lossless audio coding. In this proposed predictor, a high-order LMS predictor is employed to model the ample tonal and harmonic components of the audio signal for optimal prediction gain performance. To solve the slow convergence problem of the LMS algorithm with colored inputs, a low-order RLS predictor is cascaded prior to the LMS predictor to remove the spectral tilt of the audio signal. This cascaded RLS-LMS structure effectively mitigates the slow convergence problem of the LMS algorithm and provides superior prediction gain performance compared with the conventional LMS predictor, resulting in a better overall compression performance.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"31 1","pages":"532-537"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78952899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Microphone array post-filter based on noise field coherence 基于噪声场相干性的麦克风阵列后滤波
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818212
I. McCowan, H. Bourlard
This paper introduces a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a generalization of the existing Zelinski post-filter, which uses the auto- and cross-spectral densities of the array inputs to estimate the signal and noise spectral densities. The Zelinski technique, however, assumes zero cross-correlation between the noise on different sensors. This assumption is inaccurate, particularly at low frequencies and for arrays with closely spaced sensors, and thus the corresponding post-filter is suboptimal in realistic noise conditions. In this paper, a more general expression of the post-filter estimation is developed based on an assumed knowledge of the complex coherence of the noise field. This general expression can be used to construct a more appropriate post-filter in a variety of different noise fields. In experiments using real noise recordings from a computer office, the modified post-filter results in significant improvement in terms of objective speech quality measures and speech recognition performance using a diffuse noise model.
本文介绍了一种用于传声器阵列后滤波器传递函数的信号功率谱密度估计新技术。该技术是现有Zelinski后滤波的推广,利用阵列输入的自谱密度和交叉谱密度来估计信号和噪声的谱密度。然而,Zelinski技术假设不同传感器上的噪声之间的相互关系为零。这种假设是不准确的,特别是在低频和具有紧密间隔传感器的阵列时,因此相应的后滤波器在实际噪声条件下是次优的。在本文中,基于假设噪声场的复相干性的知识,开发了一个更一般的滤波后估计表达式。这个通用表达式可用于在各种不同的噪声场中构造更合适的后滤波器。在使用来自计算机办公室的真实噪声记录的实验中,改进后的后滤波器在客观语音质量度量和使用弥漫性噪声模型的语音识别性能方面取得了显着改善。
{"title":"Microphone array post-filter based on noise field coherence","authors":"I. McCowan, H. Bourlard","doi":"10.1109/TSA.2003.818212","DOIUrl":"https://doi.org/10.1109/TSA.2003.818212","url":null,"abstract":"This paper introduces a novel technique for estimating the signal power spectral density to be used in the transfer function of a microphone array post-filter. The technique is a generalization of the existing Zelinski post-filter, which uses the auto- and cross-spectral densities of the array inputs to estimate the signal and noise spectral densities. The Zelinski technique, however, assumes zero cross-correlation between the noise on different sensors. This assumption is inaccurate, particularly at low frequencies and for arrays with closely spaced sensors, and thus the corresponding post-filter is suboptimal in realistic noise conditions. In this paper, a more general expression of the post-filter estimation is developed based on an assumed knowledge of the complex coherence of the noise field. This general expression can be used to construct a more appropriate post-filter in a variety of different noise fields. In experiments using real noise recordings from a computer office, the modified post-filter results in significant improvement in terms of objective speech quality measures and speech recognition performance using a diffuse noise model.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"48 1","pages":"709-716"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85464928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 315
Computation of linear filter networks containing delay-free loops, with an application to the waveguide mesh 包含无延迟环路的线性滤波网络的计算,并在波导网格中的应用
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818033
Federico Fontana
A method that computes linear digital filter networks containing delay-free loops is proposed. Compared to existing techniques the proposed method does not require a rearrangement of the network structure, conversely it makes use of matrices describing this structure and specifying the connections between the filter blocks forming the network. For this reason the efficiency of the method becomes interesting when the filter blocks are densely interconnected. The Triangular Waveguide Mesh is an example of "dense" filter network: Using the proposed method we can compute a transformed, delay-free version of this mesh, obtaining simulations that are significantly more accurate compared to those provided by the traditional, explicitly computable formulation of the triangular mesh.
提出了一种计算包含无延迟环路的线性数字滤波器网络的方法。与现有技术相比,所提出的方法不需要对网络结构进行重排,相反,它利用描述该结构的矩阵并指定形成网络的过滤块之间的连接。由于这个原因,当过滤块紧密互连时,该方法的效率变得有趣。三角波导网格是“密集”滤波网络的一个例子:使用所提出的方法,我们可以计算该网格的转换,无延迟版本,与传统的,可显式计算的三角网格公式提供的模拟相比,获得更精确的模拟。
{"title":"Computation of linear filter networks containing delay-free loops, with an application to the waveguide mesh","authors":"Federico Fontana","doi":"10.1109/TSA.2003.818033","DOIUrl":"https://doi.org/10.1109/TSA.2003.818033","url":null,"abstract":"A method that computes linear digital filter networks containing delay-free loops is proposed. Compared to existing techniques the proposed method does not require a rearrangement of the network structure, conversely it makes use of matrices describing this structure and specifying the connections between the filter blocks forming the network. For this reason the efficiency of the method becomes interesting when the filter blocks are densely interconnected. The Triangular Waveguide Mesh is an example of \"dense\" filter network: Using the proposed method we can compute a transformed, delay-free version of this mesh, obtaining simulations that are significantly more accurate compared to those provided by the traditional, explicitly computable formulation of the triangular mesh.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"42 1","pages":"774-782"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89066728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Window optimization in linear prediction analysis 线性预测分析中的窗口优化
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818213
W. Chu
The autocorrelation method of linear prediction (LP) analysis relies on a window for data extraction. We propose an approach to optimize the window which is based on gradient-descent. It is shown that the optimized window can enhance the performance of LP-based speech coding algorithms; in most instances, improvement in performance comes at no additional computational cost, since it merely requires a window replacement.
线性预测(LP)分析的自相关方法依赖于一个窗口进行数据提取。提出了一种基于梯度下降的窗口优化方法。结果表明,优化后的窗口可以提高基于lp的语音编码算法的性能;在大多数情况下,性能的提高不需要额外的计算成本,因为它只需要替换一个窗口。
{"title":"Window optimization in linear prediction analysis","authors":"W. Chu","doi":"10.1109/TSA.2003.818213","DOIUrl":"https://doi.org/10.1109/TSA.2003.818213","url":null,"abstract":"The autocorrelation method of linear prediction (LP) analysis relies on a window for data extraction. We propose an approach to optimize the window which is based on gradient-descent. It is shown that the optimized window can enhance the performance of LP-based speech coding algorithms; in most instances, improvement in performance comes at no additional computational cost, since it merely requires a window replacement.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"24 1","pages":"626-635"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87186278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Nonlinear acoustic echo cancellation based on Volterra filters 基于Volterra滤波器的非线性声回波消除
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818077
A. Guérin, G. Faucon, R. Bouquin-Jeannès
This paper describes a nonlinear acoustic echo cancellation algorithm, mainly focused on loudspeaker distortions. The proposed system is composed of two distinct modules organized in a cascaded structure: a nonlinear module based on polynomial Volterra filters models the loudspeaker, and a second module of standard linear filtering identifies the impulse response of the acoustic path. The tracking of the overall system model is achieved by a modified normalized-least mean square algorithm for which equations are derived. Stability conditions are given, and particular attention is placed on the transient behavior of cascaded filters. Finally, results of real data recorded with Alcatel GSM material are presented.
本文介绍了一种非线性声回波消除算法,主要针对扬声器失真进行了研究。该系统由两个不同的级联结构模块组成:一个基于多项式Volterra滤波器的非线性模块对扬声器进行建模,第二个标准线性滤波模块识别声路的脉冲响应。采用改进的归一化最小均方算法实现了系统整体模型的跟踪。给出了稳定条件,并特别注意了级联滤波器的瞬态特性。最后给出了用阿尔卡特GSM材料记录实际数据的结果。
{"title":"Nonlinear acoustic echo cancellation based on Volterra filters","authors":"A. Guérin, G. Faucon, R. Bouquin-Jeannès","doi":"10.1109/TSA.2003.818077","DOIUrl":"https://doi.org/10.1109/TSA.2003.818077","url":null,"abstract":"This paper describes a nonlinear acoustic echo cancellation algorithm, mainly focused on loudspeaker distortions. The proposed system is composed of two distinct modules organized in a cascaded structure: a nonlinear module based on polynomial Volterra filters models the loudspeaker, and a second module of standard linear filtering identifies the impulse response of the acoustic path. The tracking of the overall system model is achieved by a modified normalized-least mean square algorithm for which equations are derived. Stability conditions are given, and particular attention is placed on the transient behavior of cascaded filters. Finally, results of real data recorded with Alcatel GSM material are presented.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"14 1","pages":"672-683"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73131099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
期刊
IEEE Trans. Speech Audio Process.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1