首页 > 最新文献

IEEE Transactions on Audio Speech and Language Processing最新文献

英文 中文
Convergence Analysis of Narrowband Feedback Active Noise Control System With Imperfect Secondary Path Estimation 次要路径估计不完全的窄带反馈有源噪声控制系统收敛性分析
Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277934
Liang Wang, W. Gan, Andy W. H. Khong, S. Kuo
In many practical active noise control (ANC) applications, feedback structure using estimated secondary path to synthesize reference signal is preferred under various conditions. This paper analyzes the convergence behavior of the narrowband feedback ANC systems with imperfect secondary path estimation. Existing approaches do not include the analysis of the reference signal synthesis errors due to its interrelated feedback nature. In this paper, the reconstruction error is modeled using the secondary path estimation error. Using this model, the effects of estimation errors on the convergence of the feedback ANC system is investigated. To further examine the effects of error in the filtered- x and filtered- y signal paths, these two paths are analyze separately to isolate the effects caused by these paths. Computer simulations are conducted to verify the theoretical analysis presented in the paper.
在许多实际的主动噪声控制(ANC)应用中,在各种条件下,使用估计的二次路径来合成参考信号的反馈结构是首选的。本文分析了具有不完全次径估计的窄带反馈ANC系统的收敛行为。由于参考信号的相互反馈特性,现有的方法不包括对参考信号合成误差的分析。本文采用二次路径估计误差对重建误差进行建模。利用该模型,研究了估计误差对反馈ANC系统收敛性的影响。为了进一步检查x滤波和y滤波信号路径中误差的影响,分别分析这两个路径,以隔离这些路径引起的影响。计算机仿真验证了本文理论分析的正确性。
{"title":"Convergence Analysis of Narrowband Feedback Active Noise Control System With Imperfect Secondary Path Estimation","authors":"Liang Wang, W. Gan, Andy W. H. Khong, S. Kuo","doi":"10.1109/TASL.2013.2277934","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277934","url":null,"abstract":"In many practical active noise control (ANC) applications, feedback structure using estimated secondary path to synthesize reference signal is preferred under various conditions. This paper analyzes the convergence behavior of the narrowband feedback ANC systems with imperfect secondary path estimation. Existing approaches do not include the analysis of the reference signal synthesis errors due to its interrelated feedback nature. In this paper, the reconstruction error is modeled using the secondary path estimation error. Using this model, the effects of estimation errors on the convergence of the feedback ANC system is investigated. To further examine the effects of error in the filtered- x and filtered- y signal paths, these two paths are analyze separately to isolate the effects caused by these paths. Computer simulations are conducted to verify the theoretical analysis presented in the paper.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277934","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Unsupervised Spoken Language Understanding for a Multi-Domain Dialog System 多域对话系统的无监督口语理解
Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2280212
Donghyeon Lee, Minwoo Jeong, Kyungduk Kim, Seonghan Ryu, G. G. Lee
This paper proposes an unsupervised spoken language understanding (SLU) framework for a multi-domain dialog system. Our unsupervised SLU framework applies a non-parametric Bayesian approach to dialog acts, intents and slot entities, which are the components of a semantic frame. The proposed approach reduces the human effort necessary to obtain a semantically annotated corpus for dialog system development. In this study, we analyze clustering results using various evaluation metrics for four dialog corpora. We also introduce a multi-domain dialog system that uses the unsupervised SLU framework. We argue that our unsupervised approach can help overcome the annotation acquisition bottleneck in developing dialog systems. To verify this claim, we report a dialog system evaluation, in which our method achieves competitive results in comparison with a system that uses a manually annotated corpus. In addition, we conducted several experiments to explore the effect of our approach on reducing development costs. The results show that our approach be helpful for the rapid development of a prototype system and reducing the overall development costs.
提出了一种用于多域对话系统的无监督口语理解框架。我们的无监督SLU框架将非参数贝叶斯方法应用于对话行为、意图和槽实体,它们是语义框架的组成部分。所提出的方法减少了为对话系统开发获取语义注释语料库所需的人力。在本研究中,我们使用不同的评价指标来分析四个对话语料库的聚类结果。我们还介绍了一个使用无监督SLU框架的多域对话系统。我们认为,我们的无监督方法可以帮助克服开发对话系统中的注释获取瓶颈。为了验证这一说法,我们报告了一个对话系统评估,与使用手动注释语料库的系统相比,我们的方法取得了有竞争力的结果。此外,我们进行了几个实验来探索我们的方法对降低开发成本的影响。结果表明,该方法有助于原型系统的快速开发,降低整体开发成本。
{"title":"Unsupervised Spoken Language Understanding for a Multi-Domain Dialog System","authors":"Donghyeon Lee, Minwoo Jeong, Kyungduk Kim, Seonghan Ryu, G. G. Lee","doi":"10.1109/TASL.2013.2280212","DOIUrl":"https://doi.org/10.1109/TASL.2013.2280212","url":null,"abstract":"This paper proposes an unsupervised spoken language understanding (SLU) framework for a multi-domain dialog system. Our unsupervised SLU framework applies a non-parametric Bayesian approach to dialog acts, intents and slot entities, which are the components of a semantic frame. The proposed approach reduces the human effort necessary to obtain a semantically annotated corpus for dialog system development. In this study, we analyze clustering results using various evaluation metrics for four dialog corpora. We also introduce a multi-domain dialog system that uses the unsupervised SLU framework. We argue that our unsupervised approach can help overcome the annotation acquisition bottleneck in developing dialog systems. To verify this claim, we report a dialog system evaluation, in which our method achieves competitive results in comparison with a system that uses a manually annotated corpus. In addition, we conducted several experiments to explore the effect of our approach on reducing development costs. The results show that our approach be helpful for the rapid development of a prototype system and reducing the overall development costs.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2280212","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio 音频过完全非负表示的主动集牛顿算法
Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2263144
T. Virtanen, J. Gemmeke, B. Raj
This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain non-negative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation quality in less time.
本文提出了一种计算效率高的大规模音频字典原子线性组合的非负权估计算法,使音频观测值与模型之间的广义Kullback-Leibler散度最小化。这种线性模型已被发现在许多音频信号处理任务中很有用,但是当使用大量原子时,现有算法的计算速度很慢。该算法基于一组活性原子的迭代更新,使用牛顿法更新权值,并估计步长,使权值保持非负。对表示两个说话者混合的音频频谱的算法收敛性评估表明,在所有测试的字典大小下,所提出的方法所获得的散度值远低于传统算法,并且速度提高了8倍。一项源分离评估表明,当使用大型字典时,该方法在更短的时间内产生了更好的分离质量。
{"title":"Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio","authors":"T. Virtanen, J. Gemmeke, B. Raj","doi":"10.1109/TASL.2013.2263144","DOIUrl":"https://doi.org/10.1109/TASL.2013.2263144","url":null,"abstract":"This paper proposes a computationally efficient algorithm for estimating the non-negative weights of linear combinations of the atoms of large-scale audio dictionaries, so that the generalized Kullback-Leibler divergence between an audio observation and the model is minimized. This linear model has been found useful in many audio signal processing tasks, but the existing algorithms are computationally slow when a large number of atoms is used. The proposed algorithm is based on iteratively updating a set of active atoms, with the weights updated using the Newton method and the step size estimated such that the weights remain non-negative. Algorithm convergence evaluations on representing audio spectra that are mixtures of two speakers show that with all the tested dictionary sizes the proposed method reaches a much lower value of the divergence than can be obtained by conventional algorithms, and is up to 8 times faster. A source separation evaluation revealed that when using large dictionaries, the proposed method produces a better separation quality in less time.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263144","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization 基于差分进化优化的鲁棒svd音频水印方案
Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2277929
B. Lei, I. Soon, Ee-Leng Tan
In this paper, a robust audio watermarking scheme based on singular value decomposition (SVD) and differential evolution (DE) using dither modulation (DM) quantization algorithm is proposed. Two novel SVD-based algorithms, lifting wavelet transform (LWT)-discrete cosine transform (DCT)-SVD and discrete wavelet transform (DWT)-DCT-SVD, are developed for audio copyright protection. In our method, LWTDWT is first applied to decompose the host signal and obtain the corresponding approximate coefficients followed by DCT to take advantage of “energy compaction” property. SVD is further performed to acquire the singular values and enhance the robustness of the scheme. The adaptive DM quantization is adopted to quantize the singular values and embed the watermark. To withstand desynchronization attacks, synchronization code is inserted using audio statistical characteristics. Furthermore, the conflicting problem of robustness and imperceptibility is effectively resolved by the DE optimization. Simulation results demonstrate that both the LWT-DCT-SVD and DWT-DCT-SVD methods not only have good imperceptibility performance, but also resist general signal processing, hybrid and desynchronization attacks. Compared with the previous DWT-DCT, support vector regression (SVR)-DWT-DCT and DWT-SVD methods, our method obtains more robustness against the selected attacks.
提出了一种基于奇异值分解(SVD)和差分进化(DE)的基于抖动调制(DM)量化算法的鲁棒音频水印方案。针对音频版权保护问题,提出了提升小波变换-离散余弦变换-SVD算法和离散小波变换-DCT-SVD算法。在我们的方法中,首先使用LWTDWT对主信号进行分解并获得相应的近似系数,然后使用DCT来利用“能量压缩”的特性。进一步进行奇异值分解,获取奇异值,增强方案的鲁棒性。采用自适应DM量化对奇异值进行量化并嵌入水印。为了抵御去同步攻击,使用音频统计特征插入同步代码。此外,该算法还有效地解决了鲁棒性和不可感知性的冲突问题。仿真结果表明,LWT-DCT-SVD和DWT-DCT-SVD方法不仅具有良好的不可感知性能,而且能够抵抗一般的信号处理、混合攻击和去同步攻击。与以往的DWT-DCT、支持向量回归(SVR)-DWT-DCT和DWT-SVD方法相比,我们的方法对所选攻击具有更强的鲁棒性。
{"title":"Robust SVD-Based Audio Watermarking Scheme With Differential Evolution Optimization","authors":"B. Lei, I. Soon, Ee-Leng Tan","doi":"10.1109/TASL.2013.2277929","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277929","url":null,"abstract":"In this paper, a robust audio watermarking scheme based on singular value decomposition (SVD) and differential evolution (DE) using dither modulation (DM) quantization algorithm is proposed. Two novel SVD-based algorithms, lifting wavelet transform (LWT)-discrete cosine transform (DCT)-SVD and discrete wavelet transform (DWT)-DCT-SVD, are developed for audio copyright protection. In our method, LWTDWT is first applied to decompose the host signal and obtain the corresponding approximate coefficients followed by DCT to take advantage of “energy compaction” property. SVD is further performed to acquire the singular values and enhance the robustness of the scheme. The adaptive DM quantization is adopted to quantize the singular values and embed the watermark. To withstand desynchronization attacks, synchronization code is inserted using audio statistical characteristics. Furthermore, the conflicting problem of robustness and imperceptibility is effectively resolved by the DE optimization. Simulation results demonstrate that both the LWT-DCT-SVD and DWT-DCT-SVD methods not only have good imperceptibility performance, but also resist general signal processing, hybrid and desynchronization attacks. Compared with the previous DWT-DCT, support vector regression (SVR)-DWT-DCT and DWT-SVD methods, our method obtains more robustness against the selected attacks.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277929","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Robust Segments Detector for De-Synchronization Resilient Audio Watermarking 鲁棒片段检测器用于去同步弹性音频水印
Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2279312
Chi-Man Pun, Xiaochen Yuan
A robust feature points detector for invariant audio watermarking is proposed in this paper. The audio segments centering at the detected feature points are extracted for both watermark embedding and extraction. These feature points are invariant to various attacks and will not be changed much for maintaining high auditory quality. Besides, high robustness and inaudibility can be achieved by embedding the watermark into the approximation coefficients of Stationary Wavelet Transform (SWT) domain, which is shift invariant. The spread spectrum communication technique is adopted to embed the watermark. Experimental results show that the proposed Robust Audio Segments Extractor (RASE) and the watermarking scheme are not only robust against common audio signal processing, such as low-pass filtering, MP3 compression, echo addition, volume change, and normalization; and distortions introduced in Stir-mark benchmark for Audio; but also robust against synchronization geometric distortions simultaneously, such as resample time-scale modification (TSM) with scaling factors up to ±50%, pitch invariant TSM by ±50%, and tempo invariant pitch shifting by ±50%. In general, the proposed scheme can well resist various attacks by the joint RASE and SWT approach, which performs much better comparing with the existing state-of-the art methods.
提出了一种鲁棒的音频不变水印特征点检测器。提取以检测到的特征点为中心的音频片段,进行水印嵌入和水印提取。这些特征点对各种攻击都是不变的,并且不会改变太多以保持高的听觉质量。此外,将水印嵌入到平移不变性的平稳小波变换(SWT)域的近似系数中,可以获得较高的鲁棒性和不可听性。水印的嵌入采用扩频通信技术。实验结果表明,所提出的鲁棒音频片段提取器(RASE)和水印方案不仅对低通滤波、MP3压缩、回声相加、音量变化和归一化等常见音频信号处理具有鲁棒性;音频搅拌标记基准中引入的失真;同时对同步几何扭曲也具有鲁棒性,例如缩放因子高达±50%的重新采样时间尺度修改(TSM),音调不变的TSM为±50%,节奏不变的音调移动为±50%。总的来说,该方案可以很好地抵抗RASE和SWT联合方法的各种攻击,与现有的最先进的方法相比,性能要好得多。
{"title":"Robust Segments Detector for De-Synchronization Resilient Audio Watermarking","authors":"Chi-Man Pun, Xiaochen Yuan","doi":"10.1109/TASL.2013.2279312","DOIUrl":"https://doi.org/10.1109/TASL.2013.2279312","url":null,"abstract":"A robust feature points detector for invariant audio watermarking is proposed in this paper. The audio segments centering at the detected feature points are extracted for both watermark embedding and extraction. These feature points are invariant to various attacks and will not be changed much for maintaining high auditory quality. Besides, high robustness and inaudibility can be achieved by embedding the watermark into the approximation coefficients of Stationary Wavelet Transform (SWT) domain, which is shift invariant. The spread spectrum communication technique is adopted to embed the watermark. Experimental results show that the proposed Robust Audio Segments Extractor (RASE) and the watermarking scheme are not only robust against common audio signal processing, such as low-pass filtering, MP3 compression, echo addition, volume change, and normalization; and distortions introduced in Stir-mark benchmark for Audio; but also robust against synchronization geometric distortions simultaneously, such as resample time-scale modification (TSM) with scaling factors up to ±50%, pitch invariant TSM by ±50%, and tempo invariant pitch shifting by ±50%. In general, the proposed scheme can well resist various attacks by the joint RASE and SWT approach, which performs much better comparing with the existing state-of-the art methods.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279312","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions 不同持续时间条件下说话人识别系统校准的质量测量函数
Pub Date : 2013-11-01 DOI: 10.1109/TASL.2013.2279332
M. I. Mandasari, R. Saeidi, Mitchell McLaren, D. V. Leeuwen
This paper investigates the effect of utterance duration to the calibration of a modern i-vector speaker recognition system with probabilistic linear discriminant analysis (PLDA) modeling. A calibration approach to deal with these effects using quality measure functions (QMFs) is proposed to include duration in the calibration transformation. Extensive experiments are performed in order to evaluate the robustness of the proposed calibration approach for unseen conditions in the training of calibration parameters. Using the latest NIST corpora for evaluation, results highlight the importance of considering the quality metrics like duration in calibrating the scores for automatic speaker recognition systems.
本文利用概率线性判别分析(PLDA)模型,研究了话语持续时间对现代i向量说话人识别系统标定的影响。提出了一种利用质量度量函数(qmf)处理这些影响的校准方法,该方法在校准转换中包含持续时间。为了评估所提出的校准方法在校准参数训练中对未知条件的鲁棒性,进行了大量的实验。使用最新的NIST语料库进行评估,结果强调了在校准自动说话人识别系统的分数时考虑持续时间等质量指标的重要性。
{"title":"Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions","authors":"M. I. Mandasari, R. Saeidi, Mitchell McLaren, D. V. Leeuwen","doi":"10.1109/TASL.2013.2279332","DOIUrl":"https://doi.org/10.1109/TASL.2013.2279332","url":null,"abstract":"This paper investigates the effect of utterance duration to the calibration of a modern i-vector speaker recognition system with probabilistic linear discriminant analysis (PLDA) modeling. A calibration approach to deal with these effects using quality measure functions (QMFs) is proposed to include duration in the calibration transformation. Extensive experiments are performed in order to evaluate the robustness of the proposed calibration approach for unseen conditions in the training of calibration parameters. Using the latest NIST corpora for evaluation, results highlight the importance of considering the quality metrics like duration in calibrating the scores for automatic speaker recognition systems.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2279332","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Semi-Blind Noise Extraction Using Partially Known Position of the Target Source 利用部分已知目标源位置的半盲噪声提取
Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2264674
Zbyněk Koldovský, J. Málek, P. Tichavský, F. Nesta
An extracted noise signal provides important information for subsequent enhancement of a target signal. When the target's position is fixed, the noise extractor could be a target-cancellation filter derived in a noise-free situation. In this paper we consider a situation when such cancellation filters are prepared for a set of several possible positions of the target in advance. The set of filters is interpreted as prior information available for the noise extraction when the target's exact position is unknown. Our novel method looks for a linear combination of the prepared filters via Independent Component Analysis. The method yields a filter that has a better cancellation performance than the individual filters or filters based on a minimum variance principle. The method is tested in a highly noisy and reverberant real-world environment with moving target source and interferers. A post-processing by Wiener filter using the noise signal extracted by the method is able to improve signal-to-noise ratio of the target by up to 8 dB.
提取的噪声信号为目标信号的后续增强提供了重要的信息。当目标位置固定时,噪声提取器可以是在无噪声情况下导出的目标抵消滤波器。在本文中,我们考虑了一种情况,即这种抵消滤波器是针对一组几个可能的目标位置预先准备的。当目标的确切位置未知时,滤波器集被解释为可用于噪声提取的先验信息。我们的新方法通过独立分量分析寻找准备好的滤波器的线性组合。该方法产生的滤波器比单个滤波器或基于最小方差原理的滤波器具有更好的抵消性能。该方法在具有运动目标源和干扰的高噪声和混响的真实环境中进行了测试。利用该方法提取的噪声信号进行维纳滤波后处理,可使目标的信噪比提高8 dB。
{"title":"Semi-Blind Noise Extraction Using Partially Known Position of the Target Source","authors":"Zbyněk Koldovský, J. Málek, P. Tichavský, F. Nesta","doi":"10.1109/TASL.2013.2264674","DOIUrl":"https://doi.org/10.1109/TASL.2013.2264674","url":null,"abstract":"An extracted noise signal provides important information for subsequent enhancement of a target signal. When the target's position is fixed, the noise extractor could be a target-cancellation filter derived in a noise-free situation. In this paper we consider a situation when such cancellation filters are prepared for a set of several possible positions of the target in advance. The set of filters is interpreted as prior information available for the noise extraction when the target's exact position is unknown. Our novel method looks for a linear combination of the prepared filters via Independent Component Analysis. The method yields a filter that has a better cancellation performance than the individual filters or filters based on a minimum variance principle. The method is tested in a highly noisy and reverberant real-world environment with moving target source and interferers. A post-processing by Wiener filter using the noise signal extracted by the method is able to improve signal-to-noise ratio of the target by up to 8 dB.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2264674","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization 基于非负矩阵分解的有监督和无监督语音增强
Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2270369
N. Mohammadiha, P. Smaragdis, A. Leijon
Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.
降低单耳噪声语音信号中的干扰噪声一直是一个具有挑战性的课题。与传统的无监督语音增强方法(如维纳滤波)相比,基于隐马尔可夫模型(HMM)的算法等监督方法可以获得更高质量的增强语音信号。然而,这些方法的主要实际困难在于,对于每种噪声类型,都需要先验地训练一个模型。本文研究了一类新的基于非负矩阵分解(NMF)的监督语音去噪算法。我们提出了一种新的基于贝叶斯公式的NMF (BNMF)语音增强方法。为了避免训练阶段和测试阶段之间的不匹配问题,我们提出了两种解决方案。首先,我们将HMM与BNMF (BNMF-HMM)结合使用,为没有潜在噪声类型信息的语音信号导出最小均方误差(MMSE)估计器。其次,我们提出了一种在线学习所需噪声BNMF模型的方案,然后将其用于开发无监督语音增强系统。通过大量的实验研究了所提出的方法在不同条件下的性能。此外,我们将开发的算法与使用各种客观度量的最先进的语音增强方案的性能进行了比较。仿真结果表明,本文提出的基于bnmf的方法在性能上明显优于同类算法。
{"title":"Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization","authors":"N. Mohammadiha, P. Smaragdis, A. Leijon","doi":"10.1109/TASL.2013.2270369","DOIUrl":"https://doi.org/10.1109/TASL.2013.2270369","url":null,"abstract":"Reducing the interference noise in a monaural noisy speech signal has been a challenging task for many years. Compared to traditional unsupervised speech enhancement methods, e.g., Wiener filtering, supervised approaches, such as algorithms based on hidden Markov models (HMM), lead to higher-quality enhanced speech signals. However, the main practical difficulty of these approaches is that for each noise type a model is required to be trained a priori. In this paper, we investigate a new class of supervised speech denoising algorithms using nonnegative matrix factorization (NMF). We propose a novel speech enhancement method that is based on a Bayesian formulation of NMF (BNMF). To circumvent the mismatch problem between the training and testing stages, we propose two solutions. First, we use an HMM in combination with BNMF (BNMF-HMM) to derive a minimum mean square error (MMSE) estimator for the speech signal with no information about the underlying noise type. Second, we suggest a scheme to learn the required noise BNMF model online, which is then used to develop an unsupervised speech enhancement system. Extensive experiments are carried out to investigate the performance of the proposed methods under different conditions. Moreover, we compare the performance of the developed algorithms with state-of-the-art speech enhancement schemes using various objective measures. Our simulations show that the proposed BNMF-based methods outperform the competing algorithms substantially.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2270369","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 370
A Direct Masking Approach to Robust ASR 鲁棒ASR的直接掩蔽方法
Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2263802
William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang
Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.
近年来,人们对用于语音隔离的二进制掩码的计算进行了大量的研究。ASR领域的传统观点认为,这些二元掩模不能直接使用;缺失能量显著影响ASR中常用的倒谱特征的计算。我们表明,这种普遍持有的信念可能是一种误解;我们演示了在小型和大型词汇表数据集上直接使用屏蔽数据的有效性。事实上,这种方法(我们称之为直接掩蔽方法)的性能与之前提出的两种缺失特征技术相当。我们还调查了其他研究人员可能没有得出这一结论的原因;特征的方差归一化是影响性能的一个重要因素。这项工作为缺失特征ASR的未来工作提供了比未增强语音更好的基线。
{"title":"A Direct Masking Approach to Robust ASR","authors":"William Hartmann, A. Narayanan, E. Fosler-Lussier, Deliang Wang","doi":"10.1109/TASL.2013.2263802","DOIUrl":"https://doi.org/10.1109/TASL.2013.2263802","url":null,"abstract":"Recently, much work has been devoted to the computation of binary masks for speech segregation. Conventional wisdom in the field of ASR holds that these binary masks cannot be used directly; the missing energy significantly affects the calculation of the cepstral features commonly used in ASR. We show that this commonly held belief may be a misconception; we demonstrate the effectiveness of directly using the masked data on both a small and large vocabulary dataset. In fact, this approach, which we term the direct masking approach, performs comparably to two previously proposed missing feature techniques. We also investigate the reasons why other researchers may have not come to this conclusion; variance normalization of the features is a significant factor in performance. This work suggests a much better baseline than unenhanced speech for future work in missing feature ASR.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2263802","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62889852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting 结合区分区域加权的连续损坏和噪声特征向量的特征增强
Pub Date : 2013-10-01 DOI: 10.1109/TASL.2013.2270407
Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, N. Minematsu, K. Hirose
This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2% and 22.2% over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.
本文提出了一种特征增强方法,可以在各种噪声环境下以可行的计算成本获得较高的语音识别性能。作为著名的基于立体的SPLICE (Piecewise Linear Compensation for Environments)算法,该方法通过学习分段线性变换,将损坏的特征向量映射到相应的干净特征,提高了操作效率。为了使特征增强过程适应噪声的变化,通过使用损坏和噪声特征向量联合空间的子空间来执行分段线性变换,其中子空间的选择使得可以最好地预测底层干净特征向量的类别(即高斯混合分量)。此外,我们建议利用损坏和噪声特征的时间相邻帧,以利用特征向量的动态特性。为了防止扩展特征向量覆盖相邻帧的高维导致的过拟合,我们引入了正则化加权最小均方误差准则。在“极光2号”任务中,该方法在清洁和多样式条件下分别比SPLICE相对提高34.2%和22.2%。
{"title":"Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting","authors":"Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, N. Minematsu, K. Hirose","doi":"10.1109/TASL.2013.2270407","DOIUrl":"https://doi.org/10.1109/TASL.2013.2270407","url":null,"abstract":"This paper proposes a feature enhancement method that can achieve high speech recognition performance in a variety of noise environments with feasible computational cost. As the well-known Stereo-based Piecewise Linear Compensation for Environments (SPLICE) algorithm, the proposed method learns piecewise linear transformation to map corrupted feature vectors to the corresponding clean features, which enables efficient operation. To make the feature enhancement process adaptive to changes in noise, the piecewise linear transformation is performed by using a subspace of the joint space of corrupted and noise feature vectors, where the subspace is chosen such that classes (i.e., Gaussian mixture components) of underlying clean feature vectors can be best predicted. In addition, we propose utilizing temporally adjacent frames of corrupted and noise features in order to leverage dynamic characteristics of feature vectors. To prevent overfitting caused by the high dimensionality of the extended feature vectors covering the neighboring frames, we introduce regularized weighted minimum mean square error criterion. The proposed method achieved relative improvements of 34.2% and 22.2% over SPLICE under the clean and multi-style conditions, respectively, on the Aurora 2 task.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2270407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62890737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
IEEE Transactions on Audio Speech and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1