首页 > 最新文献

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文 中文
Design of arbitrary delay filterbank having arbitrary order for audio applications 音频应用中任意阶的任意延迟滤波器组的设计
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701886
A. Vijayakumar, A. Makur
Literature shows that the design criteria of pth order analysis having qth order synthesis filters (p ≠ q) with a flexibility to control the system delay has never been addressed concomitantly. In this paper, we propose a systematic design for a filterbank that can have arbitrary delay with a (p, q) order. Such filterbanks play an important role especially in applications where low delay-high quality signals are required, like a digital hearing aid.
文献表明,具有q阶综合滤波器(p≠q)的p阶分析的设计准则,具有控制系统延迟的灵活性,从未同时得到解决。本文提出了一种具有任意(p, q)阶延迟的滤波器组的系统设计。这种滤波器组尤其在需要低延迟高质量信号的应用中发挥着重要作用,例如数字助听器。
{"title":"Design of arbitrary delay filterbank having arbitrary order for audio applications","authors":"A. Vijayakumar, A. Makur","doi":"10.1109/WASPAA.2013.6701886","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701886","url":null,"abstract":"Literature shows that the design criteria of pth order analysis having qth order synthesis filters (p ≠ q) with a flexibility to control the system delay has never been addressed concomitantly. In this paper, we propose a systematic design for a filterbank that can have arbitrary delay with a (p, q) order. Such filterbanks play an important role especially in applications where low delay-high quality signals are required, like a digital hearing aid.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130774493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ 在IEEE AASP挑战“声学场景和事件的检测和分类”中使用光谱-时间特征
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701868
Jens Schröder, Niko Moritz, M. R. Schädler, Benjamin Cauchi, K. Adiloglu, J. Anemüller, S. Doclo, B. Kollmeier, Stefan Goetze
In this contribution, an acoustic event detection system based on spectro-temporal features and a two-layer hidden Markov model as back-end is proposed within the framework of the IEEE AASP challenge `Detection and Classification of Acoustic Scenes and Events' (D-CASE). Noise reduction based on the log-spectral amplitude estimator by [1] and noise power density estimation by [2] is used for signal enhancement. Performance based on three different kinds of features is compared, i.e. for amplitude modulation spectrogram, Gabor filterbank-features and conventional Mel-frequency cepstral coefficients (MFCCs), all of them known from automatic speech recognition (ASR). The evaluation is based on the office live recordings provided within the D-CASE challenge. The influence of the signal enhancement is investigated and the increase in recognition rate by the proposed features in comparison to MFCC-features is shown. It is demonstrated that the proposed spectro-temporal features achieve a better recognition accuracy than MFCCs.
在此贡献中,在IEEE AASP挑战“声学场景和事件的检测和分类”(D-CASE)的框架内,提出了基于光谱-时间特征和两层隐马尔可夫模型作为后端的声学事件检测系统。信号增强采用基于[1]的对数谱幅度估计和[2]的噪声功率密度估计的降噪方法。基于三种不同类型的特征,即调幅谱图,Gabor滤波器组特征和传统mel -频率倒谱系数(MFCCs)的性能进行了比较,所有这些特征都是自动语音识别(ASR)中已知的。评估基于D-CASE挑战中提供的办公室现场录音。研究了信号增强的影响,并与mfcc特征相比,表明了所提出特征对识别率的提高。结果表明,所提出的光谱-时间特征比mfccc具有更好的识别精度。
{"title":"On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’","authors":"Jens Schröder, Niko Moritz, M. R. Schädler, Benjamin Cauchi, K. Adiloglu, J. Anemüller, S. Doclo, B. Kollmeier, Stefan Goetze","doi":"10.1109/WASPAA.2013.6701868","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701868","url":null,"abstract":"In this contribution, an acoustic event detection system based on spectro-temporal features and a two-layer hidden Markov model as back-end is proposed within the framework of the IEEE AASP challenge `Detection and Classification of Acoustic Scenes and Events' (D-CASE). Noise reduction based on the log-spectral amplitude estimator by [1] and noise power density estimation by [2] is used for signal enhancement. Performance based on three different kinds of features is compared, i.e. for amplitude modulation spectrogram, Gabor filterbank-features and conventional Mel-frequency cepstral coefficients (MFCCs), all of them known from automatic speech recognition (ASR). The evaluation is based on the office live recordings provided within the D-CASE challenge. The influence of the signal enhancement is investigated and the increase in recognition rate by the proposed features in comparison to MFCC-features is shown. It is demonstrated that the proposed spectro-temporal features achieve a better recognition accuracy than MFCCs.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"181 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116654944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
Ensemble learning for speech enhancement 语音增强的集成学习
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701888
Jonathan Le Roux, Shinji Watanabe, J. Hershey
Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.
多年来,已经提出了无数算法来解决从噪声混合中增强语音的问题。许多公司成功地改善了至少部分信号,但往往会使其他信号恶化。基于不同的算法可能具有不同的质量和缺陷的假设,我们研究了结合多种语音增强算法的优势,在集成学习框架中制定问题的可能性。作为这种系统的第一个例子,我们考虑基于各种算法对噪声混合物的输出,从干净的语音中获得时频掩模的预测。我们考虑了几种方法,涉及各种上下文概念和用于分类的各种机器学习算法,在二元掩模的情况下,以及在连续掩模的情况下的回归。我们表明,以这种方式组合几种算法可以提高增强性能,而简单的平均或投票技术则无法做到这一点。
{"title":"Ensemble learning for speech enhancement","authors":"Jonathan Le Roux, Shinji Watanabe, J. Hershey","doi":"10.1109/WASPAA.2013.6701888","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701888","url":null,"abstract":"Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123296057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Tracking pitch period using particle filters 使用粒子滤波器跟踪音高周期
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701846
Geliang Zhang, S. Godsill
Pitch tracking has been used in many speech processing applications. Most present time domain techniques in pitch estimation mainly use autocorrelation methods and the average magnitude difference functions. This paper aims to track pitch period of speech using the particle filter approach. A simple model has been proposed to capture the pitch period variations of noisy speech during voiced periods. Performance of the proposed method is compared with standard pitch detection algorithms. Simulation results show that the proposed method can track the pitch period even if strong noise exists. It suggests that the particle filter approach could be an alternative way to address the pitch tracking problem.
音高跟踪已经在许多语音处理应用中使用。目前的基音估计时域技术主要采用自相关方法和平均幅度差函数。本文旨在利用粒子滤波方法跟踪语音的音高周期。提出了一个简单的模型来捕捉在浊音期间有噪声语音的音高周期变化。将该方法的性能与标准基音检测算法进行了比较。仿真结果表明,即使存在强噪声,该方法也能很好地跟踪基音周期。这表明粒子滤波方法可能是解决音高跟踪问题的另一种方法。
{"title":"Tracking pitch period using particle filters","authors":"Geliang Zhang, S. Godsill","doi":"10.1109/WASPAA.2013.6701846","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701846","url":null,"abstract":"Pitch tracking has been used in many speech processing applications. Most present time domain techniques in pitch estimation mainly use autocorrelation methods and the average magnitude difference functions. This paper aims to track pitch period of speech using the particle filter approach. A simple model has been proposed to capture the pitch period variations of noisy speech during voiced periods. Performance of the proposed method is compared with standard pitch detection algorithms. Simulation results show that the proposed method can track the pitch period even if strong noise exists. It suggests that the particle filter approach could be an alternative way to address the pitch tracking problem.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127665749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Average output SNR of the multichannel Wiener filter using statistical room acoustics 使用统计室内声学的多通道维纳滤波器的平均输出信噪比
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701813
Toby Christian Lawin-Ore, S. Doclo
The performance of the multi-channel Wiener filter (MWF), which is often used for noise reduction in speech enhancement applications, depends on the noise field and on the acoustic transfer functions (ATFs) between the desired source and the microphone array. Recently, using statistical room acoustics an analytical expression for the spatially averaged output SNR, given the relative distance between the source and the microphone array, has been derived for the MWF in a diffuse noise field, requiring only the room properties to be known. In this paper, we show that this analytical expression can be extended to compute the average output SNR of the MWF for a specific microphone configuration, enabling to compare the performance of different microphone configurations, e.g. in an acoustic sensor network. Simulation results show that the average output SNR obtained using the statistical properties of ATFs is similar to the average output SNR obtained using simulated ATFs, therefore providing an efficient way to compare different microphone configurations.
多通道维纳滤波器(MWF)在语音增强应用中经常用于降噪,其性能取决于噪声场和所需声源与麦克风阵列之间的声学传递函数(atf)。最近,利用统计室内声学,导出了漫射噪声场中MWF的空间平均输出信噪比的解析表达式,该表达式考虑了源和麦克风阵列之间的相对距离,只需要知道房间特性。在本文中,我们证明了该解析表达式可以扩展到计算特定麦克风配置的MWF的平均输出信噪比,从而能够比较不同麦克风配置的性能,例如在声学传感器网络中。仿真结果表明,利用atf的统计特性得到的平均输出信噪比与利用仿真atf得到的平均输出信噪比相似,为比较不同麦克风配置提供了一种有效的方法。
{"title":"Average output SNR of the multichannel Wiener filter using statistical room acoustics","authors":"Toby Christian Lawin-Ore, S. Doclo","doi":"10.1109/WASPAA.2013.6701813","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701813","url":null,"abstract":"The performance of the multi-channel Wiener filter (MWF), which is often used for noise reduction in speech enhancement applications, depends on the noise field and on the acoustic transfer functions (ATFs) between the desired source and the microphone array. Recently, using statistical room acoustics an analytical expression for the spatially averaged output SNR, given the relative distance between the source and the microphone array, has been derived for the MWF in a diffuse noise field, requiring only the room properties to be known. In this paper, we show that this analytical expression can be extended to compute the average output SNR of the MWF for a specific microphone configuration, enabling to compare the performance of different microphone configurations, e.g. in an acoustic sensor network. Simulation results show that the average output SNR obtained using the statistical properties of ATFs is similar to the average output SNR obtained using simulated ATFs, therefore providing an efficient way to compare different microphone configurations.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126380269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tapping-noise suppression with magnitude-weighted phase-based detection 基于幅度加权相位检测的抽头噪声抑制
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701871
A. Sugiyama, Ryoji Miyahara
This paper proposes tapping noise suppression with a new phase-based detection. Phase slope of the input noisy signal is compared with an ideal phase slope obtained from an average of intra-frame slopes along the frequency axis. In order to cope with heavily low-pass characteristics of tapping noise spectrum, phase values are weighted with the magnitude at each frequency point. Phase unwrapping problem is alleviated by use of a rotation vector of frequency domain components. Comparison of enhanced signal spectrogram with that of clean speech demonstrates superior enhanced signal quality.
本文提出了一种新的基于相位检测的分攻噪声抑制方法。将输入噪声信号的相位斜率与沿频率轴的帧内斜率的平均值得到的理想相位斜率进行比较。为了应对分接噪声频谱的严重低通特性,相位值与每个频率点的幅值加权。利用频域分量的旋转矢量,缓解了相位展开问题。增强后的信号谱图与干净语音谱图的对比表明增强后的信号质量更好。
{"title":"Tapping-noise suppression with magnitude-weighted phase-based detection","authors":"A. Sugiyama, Ryoji Miyahara","doi":"10.1109/WASPAA.2013.6701871","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701871","url":null,"abstract":"This paper proposes tapping noise suppression with a new phase-based detection. Phase slope of the input noisy signal is compared with an ideal phase slope obtained from an average of intra-frame slopes along the frequency axis. In order to cope with heavily low-pass characteristics of tapping noise spectrum, phase values are weighted with the magnitude at each frequency point. Phase unwrapping problem is alleviated by use of a rotation vector of frequency domain components. Comparison of enhanced signal spectrogram with that of clean speech demonstrates superior enhanced signal quality.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"72 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131998355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A multichannel Wiener filter with partial equalization for distributed microphones 分布式麦克风的部分均衡多通道维纳滤波器
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701874
S. Stenzel, Toby Christian Lawin-Ore, J. Freudenberger, S. Doclo
In speech enhancement applications, the multichannel Wiener filter (MWF) is widely used to reduce noise and thus improve signal quality. The MWF performs noise reduction by estimating the desired signal component in one of the microphones, referred to as the reference microphone. However, for distributed microphones, the selection of the reference microphone has a significant impact on the broadband output SNR of the MWF, largely depending on the acoustical transfer function (ATF) between the desired source and the reference microphone. In this paper, a multichannel Wiener filtering approach using a soft combined reference is presented. Simulation results show that the proposed scheme leads to a higher broadband output SNR compared to an arbitrarily selected reference microphone, moreover achieving a partial equalization of the overall acoustic system.
在语音增强应用中,多通道维纳滤波器(MWF)被广泛用于降低噪声,从而提高信号质量。MWF通过估计其中一个麦克风(称为参考麦克风)中所需的信号分量来执行降噪。然而,对于分布式麦克风,参考麦克风的选择对MWF的宽带输出信噪比有重大影响,这在很大程度上取决于期望源和参考麦克风之间的声学传递函数(ATF)。本文提出了一种基于软组合参考的多通道维纳滤波方法。仿真结果表明,与任意选择参考麦克风相比,该方案具有更高的宽带输出信噪比,并且实现了整个声学系统的部分均衡。
{"title":"A multichannel Wiener filter with partial equalization for distributed microphones","authors":"S. Stenzel, Toby Christian Lawin-Ore, J. Freudenberger, S. Doclo","doi":"10.1109/WASPAA.2013.6701874","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701874","url":null,"abstract":"In speech enhancement applications, the multichannel Wiener filter (MWF) is widely used to reduce noise and thus improve signal quality. The MWF performs noise reduction by estimating the desired signal component in one of the microphones, referred to as the reference microphone. However, for distributed microphones, the selection of the reference microphone has a significant impact on the broadband output SNR of the MWF, largely depending on the acoustical transfer function (ATF) between the desired source and the reference microphone. In this paper, a multichannel Wiener filtering approach using a soft combined reference is presented. Simulation results show that the proposed scheme leads to a higher broadband output SNR compared to an arbitrarily selected reference microphone, moreover achieving a partial equalization of the overall acoustic system.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134244311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Keynote addresses: From auditory masking to binary classification: Machine learning for speech separation 主题演讲:从听觉掩蔽到二元分类:语音分离的机器学习
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701900
Deliang Wang
Summary form only given. Speech separation, or the cocktail party problem, is a widely acknowledged challenge. Part of the challenge stems from the confusion of what the computational goal should be. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal time-frequency binary mask as a main goal for computational auditory scene analysis. This leads to a new formulation to speech separation that classifies time-frequency units into two classes: those dominated by the target speech and the rest. In supervised learning, a paramount issue is generalization to conditions unseen during training. I describe novel methods to deal with the generalization issue where support vector machines (SVMs) are used to estimate the ideal binary mask. One method employs distribution fitting to adapt to unseen signal-to-noise ratios and iterative voice activity detection to adapt to unseen noises. Another method learns more linearly separable features using deep neural networks (DNNs) and then couples DNN and linear SVM for training on a variety of noisy conditions. Systematic evaluations show high quality separation in new acoustic environments.
只提供摘要形式。言语分离,或鸡尾酒会问题,是一个公认的挑战。部分挑战源于对计算目标应该是什么的困惑。虽然将混合声音中的每个声源分离出来被认为是黄金标准,但我认为这样的目标既不现实,也不符合人类听觉系统的功能。受听觉掩蔽现象的启发,我们建议将理想时频二值掩码作为计算听觉场景分析的主要目标。这导致了一种新的语音分离公式,将时间-频率单位分为两类:由目标语音主导的和其余的。在监督学习中,最重要的问题是对训练过程中看不到的条件进行泛化。我描述了处理泛化问题的新方法,其中使用支持向量机(svm)来估计理想的二进制掩码。一种方法采用分布拟合来适应看不见的信噪比,采用迭代语音活动检测来适应看不见的噪声。另一种方法是使用深度神经网络(DNN)学习更多的线性可分特征,然后将DNN和线性支持向量机(SVM)耦合在各种噪声条件下进行训练。系统评价表明,在新的声环境中分离效果良好。
{"title":"Keynote addresses: From auditory masking to binary classification: Machine learning for speech separation","authors":"Deliang Wang","doi":"10.1109/WASPAA.2013.6701900","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701900","url":null,"abstract":"Summary form only given. Speech separation, or the cocktail party problem, is a widely acknowledged challenge. Part of the challenge stems from the confusion of what the computational goal should be. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal time-frequency binary mask as a main goal for computational auditory scene analysis. This leads to a new formulation to speech separation that classifies time-frequency units into two classes: those dominated by the target speech and the rest. In supervised learning, a paramount issue is generalization to conditions unseen during training. I describe novel methods to deal with the generalization issue where support vector machines (SVMs) are used to estimate the ideal binary mask. One method employs distribution fitting to adapt to unseen signal-to-noise ratios and iterative voice activity detection to adapt to unseen noises. Another method learns more linearly separable features using deep neural networks (DNNs) and then couples DNN and linear SVM for training on a variety of noisy conditions. Systematic evaluations show high quality separation in new acoustic environments.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131787070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Identifying salient sounds using dual-task experiments 使用双任务实验识别显著声音
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701865
Varinthira Duangudom, David V. Anderson
Auditory saliency refers to the characteristics of a sound that cause it to attract the attention of a listener. Pre-attentive or bottom-up saliency has to do with automatic processing in the human auditory system that does not require and often precedes attention. Unlike visual saliency, where eye-tracking is a commonly used evaluation method, with auditory saliency, there is no easily trackable physical correlate that can be used for evaluation. Other auditory saliency models [1, 2] have been evaluated using tests that did not specifically target bottom-up saliency. In this paper, we present a method to conclusively isolate bottom-up auditory saliency. There are also several important applications to bottom-up saliency in auditory scene analysis, auditory display design and analysis, and speech processing.
听觉显著性指的是声音能够吸引听者注意的特征。预先注意或自下而上的显著性与人类听觉系统中的自动处理有关,这种自动处理不需要注意,而且经常先于注意。与视觉显著性不同,眼动追踪是一种常用的评估方法,听觉显著性没有容易追踪的物理关联可用于评估。其他听觉显著性模型[1,2]已被评估,使用的测试并没有专门针对自下而上的显著性。在本文中,我们提出了一种方法来最终分离自下而上的听觉显著性。自下而上显著性在听觉场景分析、听觉显示设计和分析以及语音处理中也有一些重要的应用。
{"title":"Identifying salient sounds using dual-task experiments","authors":"Varinthira Duangudom, David V. Anderson","doi":"10.1109/WASPAA.2013.6701865","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701865","url":null,"abstract":"Auditory saliency refers to the characteristics of a sound that cause it to attract the attention of a listener. Pre-attentive or bottom-up saliency has to do with automatic processing in the human auditory system that does not require and often precedes attention. Unlike visual saliency, where eye-tracking is a commonly used evaluation method, with auditory saliency, there is no easily trackable physical correlate that can be used for evaluation. Other auditory saliency models [1, 2] have been evaluated using tests that did not specifically target bottom-up saliency. In this paper, we present a method to conclusively isolate bottom-up auditory saliency. There are also several important applications to bottom-up saliency in auditory scene analysis, auditory display design and analysis, and speech processing.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133427678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Employing moments of multiple high orders for high-resolution underdetermined DOA estimation based on MUSIC 利用多高阶矩进行基于MUSIC的高分辨率欠定DOA估计
Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701866
Yuya Sugimoto, S. Miyabe, Takeshi Yamada, S. Makino, B. Juang
Several extensions of the MUltiple SIgnal Classification (MUSIC) algorithm exploiting high order statistics were proposed to estimate directions of arrival (DOAs) with high resolution in underdetermined conditions. However, these methods entail a trade-off between two performance goals, namely, robustness and resolution, in the choice of orders because use of high-ordered statistics increases not only the resolution but also the statistical bias. To overcome this problem, this paper proposes a new extension of MUSIC using a nonlinear high-dimensional map, which corresponds to the joint analysis of moments of multiple orders and helps to realize the both advantages of robustness and high resolution of low-ordered and high-ordered statistics. Experimental results show that the proposed method can estimate DOAs more accurately than the conventional MUSIC extensions exploiting moments of a single high order.
提出了几种利用高阶统计量的多信号分类(MUSIC)算法的扩展,用于欠确定条件下高分辨率估计到达方向(DOAs)。然而,在选择顺序时,这些方法需要在两个性能目标(即鲁棒性和分辨率)之间进行权衡,因为使用高阶统计不仅会增加分辨率,还会增加统计偏差。为了克服这一问题,本文提出了一种新的MUSIC扩展方法,利用非线性高维映射对应多阶矩的联合分析,实现了低阶和高阶统计量的鲁棒性和高分辨率的优势。实验结果表明,与利用单个高阶矩的传统MUSIC扩展相比,该方法可以更准确地估计doa。
{"title":"Employing moments of multiple high orders for high-resolution underdetermined DOA estimation based on MUSIC","authors":"Yuya Sugimoto, S. Miyabe, Takeshi Yamada, S. Makino, B. Juang","doi":"10.1109/WASPAA.2013.6701866","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701866","url":null,"abstract":"Several extensions of the MUltiple SIgnal Classification (MUSIC) algorithm exploiting high order statistics were proposed to estimate directions of arrival (DOAs) with high resolution in underdetermined conditions. However, these methods entail a trade-off between two performance goals, namely, robustness and resolution, in the choice of orders because use of high-ordered statistics increases not only the resolution but also the statistical bias. To overcome this problem, this paper proposes a new extension of MUSIC using a nonlinear high-dimensional map, which corresponds to the joint analysis of moments of multiple orders and helps to realize the both advantages of robustness and high resolution of low-ordered and high-ordered statistics. Experimental results show that the proposed method can estimate DOAs more accurately than the conventional MUSIC extensions exploiting moments of a single high order.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134310732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1