首页 > 最新文献

IEEE Transactions on Audio Speech and Language Processing最新文献

英文 中文
A High-Quality Speech and Audio Codec With Less Than 10-ms Delay 一个高质量的语音和音频编解码器与小于10毫秒的延迟
Pub Date : 2016-02-17 DOI: 10.1109/TASL.2009.2023186
J. Valin, Timothy B. Terriberry, Christopher Montgomery, Gregory Maxwell
With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantization in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kb/s and 64 kb/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.
随着多媒体通信质量要求的不断提高,音频编解码器必须同时保持高质量和低延迟。通常,音频编解码器要么提供低延迟要么提供高质量,但很少两者兼而有之。我们提出了一种同时满足这两个要求的编解码器,在44.1 kHz时延迟仅为8.7 ms。它在频域采用增益形代数矢量量化,并在时域进行基音预测。我们证明了所提出的编解码器以48 kb/s和64 kb/s的速度运行,优于g . 722.c和MP3,并且具有与AAC-LD相当的质量,尽管这些编解码器的算法延迟不到四分之一。
{"title":"A High-Quality Speech and Audio Codec With Less Than 10-ms Delay","authors":"J. Valin, Timothy B. Terriberry, Christopher Montgomery, Gregory Maxwell","doi":"10.1109/TASL.2009.2023186","DOIUrl":"https://doi.org/10.1109/TASL.2009.2023186","url":null,"abstract":"With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantization in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kb/s and 64 kb/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2009.2023186","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62852437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
Efficient Approximation of Head-Related Transfer Functions in Subbands for Accurate Sound Localization. 高效逼近子带中的头部相关传递函数,实现精确声音定位
Damián Marelli, Robert Baumgartner, Piotr Majdak

Head-related transfer functions (HRTFs) describe the acoustic filtering of incoming sounds by the human morphology and are essential for listeners to localize sound sources in virtual auditory displays. Since rendering complex virtual scenes is computationally demanding, we propose four algorithms for efficiently representing HRTFs in subbands, i.e., as an analysis filterbank (FB) followed by a transfer matrix and a synthesis FB. All four algorithms use sparse approximation procedures to minimize the computational complexity while maintaining perceptually relevant HRTF properties. The first two algorithms separately optimize the complexity of the transfer matrix associated to each HRTF for fixed FBs. The other two algorithms jointly optimize the FBs and transfer matrices for complete HRTF sets by two variants. The first variant aims at minimizing the complexity of the transfer matrices, while the second one does it for the FBs. Numerical experiments investigate the latency-complexity trade-off and show that the proposed methods offer significant computational savings when compared with other available approaches. Psychoacoustic localization experiments were modeled and conducted to find a reasonable approximation tolerance so that no significant localization performance degradation was introduced by the subband representation.

头部相关传递函数(HRTFs)描述了人体形态对传入声音的声学过滤,对于听者在虚拟听觉显示中定位声源至关重要。由于渲染复杂的虚拟场景对计算要求很高,因此我们提出了四种算法,用于在子带中有效地表示 HRTF,即作为分析滤波器库(FB),然后是传递矩阵和合成滤波器库。所有四种算法都使用稀疏近似程序,以最大限度地降低计算复杂度,同时保持与感知相关的 HRTF 特性。前两种算法分别优化了固定 FB 的每个 HRTF 相关传递矩阵的复杂度。另外两种算法通过两种变体联合优化了完整 HRTF 集的 FB 和传递矩阵。第一个变体旨在最小化传递矩阵的复杂度,而第二个变体则针对 FB 进行优化。数值实验研究了延迟与复杂性之间的权衡,结果表明,与其他可用方法相比,所提出的方法大大节省了计算量。为了找到一个合理的近似容差,使子带表示不会带来明显的定位性能下降,我们建立了心理声学定位模型并进行了实验。
{"title":"Efficient Approximation of Head-Related Transfer Functions in Subbands for Accurate Sound Localization.","authors":"Damián Marelli, Robert Baumgartner, Piotr Majdak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Head-related transfer functions (HRTFs) describe the acoustic filtering of incoming sounds by the human morphology and are essential for listeners to localize sound sources in virtual auditory displays. Since rendering complex virtual scenes is computationally demanding, we propose four algorithms for efficiently representing HRTFs in subbands, i.e., as an analysis filterbank (FB) followed by a transfer matrix and a synthesis FB. All four algorithms use sparse approximation procedures to minimize the computational complexity while maintaining perceptually relevant HRTF properties. The first two algorithms separately optimize the complexity of the transfer matrix associated to each HRTF for fixed FBs. The other two algorithms jointly optimize the FBs and transfer matrices for complete HRTF sets by two variants. The first variant aims at minimizing the complexity of the transfer matrices, while the second one does it for the FBs. Numerical experiments investigate the latency-complexity trade-off and show that the proposed methods offer significant computational savings when compared with other available approaches. Psychoacoustic localization experiments were modeled and conducted to find a reasonable approximation tolerance so that no significant localization performance degradation was introduced by the subband representation.</p>","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140208278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement 基于优势度的语音增强空间与频谱特征集成
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2277937
T. Nakatani, S. Araki, Takuya Yoshioka, Marc Delcroix, M. Fujimoto
This paper proposes a versatile technique for integrating two conventional speech enhancement approaches, a spatial clustering approach (SCA) and a factorial model approach (FMA), which are based on two different features of signals, namely spatial and spectral features, respectively. When used separately the conventional approaches simply identify time frequency (TF) bins that are dominated by interference for speech enhancement. Integration of the two approaches makes identification more reliable, and allows us to estimate speech spectra more accurately even in highly nonstationary interference environments. This paper also proposes extensions of the FMA for further elaboration of the proposed technique, including one that uses spectral models based on mel-frequency cepstral coefficients and another to cope with mismatches, such as channel mismatches, between captured signals and the spectral models. Experiments using simulated and real recordings show that the proposed technique can effectively improve audible speech quality and the automatic speech recognition score.
本文提出了一种集成两种传统语音增强方法的通用技术,即空间聚类方法(SCA)和因子模型方法(FMA),这两种方法分别基于信号的两个不同特征,即空间特征和频谱特征。当单独使用时,传统方法简单地识别受干扰主导的时间频率(TF)箱以增强语音。两种方法的集成使得识别更加可靠,并且允许我们在高度非平稳干扰环境中更准确地估计语音频谱。本文还提出了FMA的扩展,以进一步阐述所提出的技术,包括使用基于梅尔频率倒谱系数的频谱模型,以及处理捕获信号与频谱模型之间的不匹配,例如信道不匹配。仿真和真实录音实验表明,该方法能有效提高可听语音质量,提高语音自动识别分数。
{"title":"Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement","authors":"T. Nakatani, S. Araki, Takuya Yoshioka, Marc Delcroix, M. Fujimoto","doi":"10.1109/TASL.2013.2277937","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277937","url":null,"abstract":"This paper proposes a versatile technique for integrating two conventional speech enhancement approaches, a spatial clustering approach (SCA) and a factorial model approach (FMA), which are based on two different features of signals, namely spatial and spectral features, respectively. When used separately the conventional approaches simply identify time frequency (TF) bins that are dominated by interference for speech enhancement. Integration of the two approaches makes identification more reliable, and allows us to estimate speech spectra more accurately even in highly nonstationary interference environments. This paper also proposes extensions of the FMA for further elaboration of the proposed technique, including one that uses spectral models based on mel-frequency cepstral coefficients and another to cope with mismatches, such as channel mismatches, between captured signals and the spectral models. Experiments using simulated and real recordings show that the proposed technique can effectively improve audible speech quality and the automatic speech recognition score.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277937","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Linearly-Constrained Minimum-Variance Method for Spherical Microphone Arrays Based on Plane-Wave Decomposition of the Sound Field 基于声场平面波分解的球形传声器阵列线性约束最小方差法
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2277939
Yotam Peled, B. Rafaely
Speech signals recorded in real environments may be corrupted by ambient noise and reverberation. Therefore, noise reduction and dereverberation algorithms for speech enhancement are typically employed in speech communication systems. Although microphone arrays are useful in reducing the effect of noise and reverberation, existing methods have limited success in significantly removing both reverberation and noise in real environments. This paper presents a method for noise reduction and dereverberation that overcomes some of the limitations of previous methods. The method uses a spherical microphone array to achieve plane-wave decomposition (PWD) of the sound field, based on direction-of-arrival (DOA) estimation of the desired signal and its reflections. A multi-channel linearly-constrained minimum-variance (LCMV) filter is introduced to achieve further noise reduction. The PWD beamformer achieves dereverberation while the LCMV filter reduces the uncorrelated noise with a controllable dereverberation constraint. In contrast to other methods, the proposed method employs DOA estimation, rather than room impulse response identification, to achieve dereverberation, and relative transfer function (RTF) estimation between the source reflections to achieve noise reduction while avoiding signal cancellation. The paper includes a simulation investigation and an experimental study, comparing the proposed method to currently available methods.
在真实环境中录制的语音信号可能会受到环境噪声和混响的干扰。因此,用于语音增强的降噪和去噪算法通常用于语音通信系统。虽然麦克风阵列在减少噪声和混响的影响方面很有用,但现有的方法在显著消除实际环境中的混响和噪声方面收效甚微。本文提出了一种降噪和去噪的方法,克服了以往方法的一些局限性。该方法利用球形传声器阵列,基于期望信号及其反射的到达方向(DOA)估计,实现声场的平面波分解(PWD)。采用多通道线性约束最小方差(LCMV)滤波器实现进一步降噪。PWD波束形成器实现去噪,而LCMV滤波器在可控的去噪约束下降低了不相关噪声。与其他方法相比,该方法使用DOA估计而不是房间脉冲响应识别来实现去噪,并使用源反射之间的相对传递函数(RTF)估计来实现降噪,同时避免信号抵消。本文包括仿真研究和实验研究,并将所提出的方法与现有方法进行了比较。
{"title":"Linearly-Constrained Minimum-Variance Method for Spherical Microphone Arrays Based on Plane-Wave Decomposition of the Sound Field","authors":"Yotam Peled, B. Rafaely","doi":"10.1109/TASL.2013.2277939","DOIUrl":"https://doi.org/10.1109/TASL.2013.2277939","url":null,"abstract":"Speech signals recorded in real environments may be corrupted by ambient noise and reverberation. Therefore, noise reduction and dereverberation algorithms for speech enhancement are typically employed in speech communication systems. Although microphone arrays are useful in reducing the effect of noise and reverberation, existing methods have limited success in significantly removing both reverberation and noise in real environments. This paper presents a method for noise reduction and dereverberation that overcomes some of the limitations of previous methods. The method uses a spherical microphone array to achieve plane-wave decomposition (PWD) of the sound field, based on direction-of-arrival (DOA) estimation of the desired signal and its reflections. A multi-channel linearly-constrained minimum-variance (LCMV) filter is introduced to achieve further noise reduction. The PWD beamformer achieves dereverberation while the LCMV filter reduces the uncorrelated noise with a controllable dereverberation constraint. In contrast to other methods, the proposed method employs DOA estimation, rather than room impulse response identification, to achieve dereverberation, and relative transfer function (RTF) estimation between the source reflections to achieve noise reduction while avoiding signal cancellation. The paper includes a simulation investigation and an experimental study, comparing the proposed method to currently available methods.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2277939","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A Class of Optimal Rectangular Filtering Matrices for Single-Channel Signal Enhancement in the Time Domain 一类用于时域单通道信号增强的最优矩形滤波矩阵
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2280215
J. Jensen, J. Benesty, M. G. Christensen, Jingdong Chen
In this paper, we introduce a new class of optimal rectangular filtering matrices for single-channel speech enhancement. The new class of filters exploits the fact that the dimension of the signal subspace is lower than that of the full space. By doing this, extra degrees of freedom in the filters, that are otherwise reserved for preserving the signal subspace, can be used for achieving an improved output signal-to-noise ratio (SNR). Moreover, the filters allow for explicit control of the tradeoff between noise reduction and speech distortion via the chosen rank of the signal subspace. An interesting aspect is that the framework in which the filters are derived unifies the ideas of optimal filtering and subspace methods. A number of different optimal filter designs are derived in this framework, and the properties and performance of these are studied using both synthetic, periodic signals and real signals. The results show a number of interesting things. Firstly, they show how speech distortion can be traded for noise reduction and vice versa in a seamless manner. Moreover, the introduced filter designs are capable of achieving both the upper and lower bounds for the output SNR via the choice of a single parameter.
本文介绍了一类新的用于单通道语音增强的最优矩形滤波矩阵。这种新型滤波器利用了信号子空间的维数低于整个空间的维数这一事实。通过这样做,额外的自由度在滤波器中,否则保留保留的信号子空间,可用于实现提高输出信噪比(SNR)。此外,滤波器允许通过选择信号子空间的秩来显式控制降噪和语音失真之间的权衡。一个有趣的方面是,导出滤波器的框架统一了最优滤波和子空间方法的思想。在此框架下推导了许多不同的最优滤波器设计,并使用合成信号、周期信号和真实信号研究了这些滤波器的特性和性能。结果显示了一些有趣的事情。首先,它们展示了如何以无缝的方式交换语音失真和降噪,反之亦然。此外,所介绍的滤波器设计能够通过选择单个参数来实现输出信噪比的上限和下限。
{"title":"A Class of Optimal Rectangular Filtering Matrices for Single-Channel Signal Enhancement in the Time Domain","authors":"J. Jensen, J. Benesty, M. G. Christensen, Jingdong Chen","doi":"10.1109/TASL.2013.2280215","DOIUrl":"https://doi.org/10.1109/TASL.2013.2280215","url":null,"abstract":"In this paper, we introduce a new class of optimal rectangular filtering matrices for single-channel speech enhancement. The new class of filters exploits the fact that the dimension of the signal subspace is lower than that of the full space. By doing this, extra degrees of freedom in the filters, that are otherwise reserved for preserving the signal subspace, can be used for achieving an improved output signal-to-noise ratio (SNR). Moreover, the filters allow for explicit control of the tradeoff between noise reduction and speech distortion via the chosen rank of the signal subspace. An interesting aspect is that the framework in which the filters are derived unifies the ideas of optimal filtering and subspace methods. A number of different optimal filter designs are derived in this framework, and the properties and performance of these are studied using both synthetic, periodic signals and real signals. The results show a number of interesting things. Firstly, they show how speech distortion can be traded for noise reduction and vice versa in a seamless manner. Moreover, the introduced filter designs are capable of achieving both the upper and lower bounds for the output SNR via the choice of a single parameter.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2280215","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs hmm判别训练的em型优化算法研究
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2280234
G. Heigold, H. Ney, R. Schlüter
Today's speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). Currently, the optimization is almost always done with (empirical variants of) Extended Baum-Welch (EBW). This type of optimization requires sophisticated update schemes for the step sizes and a considerable amount of parameter tuning, and only little is known about its convergence behavior. In this paper, we derive an EM-style algorithm for discriminative training of HMMs. Like Expectation-Maximization (EM) for the generative training of HMMs, the proposed algorithm improves the training criterion on each iteration, converges to a local optimum, and is completely parameter-free. We investigate the feasibility of the proposed EM-style algorithm for discriminative training of two tasks, namely grapheme-to-phoneme conversion and spoken digit string recognition.
目前的语音识别系统是基于隐马尔可夫模型(hmm)和高斯混合模型,这些模型的参数是使用判别训练准则(如最大互信息(MMI)或最小电话误差(MPE))来估计的。目前,优化几乎总是用扩展鲍姆-韦尔奇(EBW)的(经验变体)来完成。这种类型的优化需要复杂的步长更新方案和大量的参数调优,并且对其收敛行为知之甚少。在本文中,我们推导了一种em风格的hmm判别训练算法。与基于期望最大化的hmm生成训练算法一样,该算法改进了每次迭代的训练准则,收敛到局部最优,并且完全无参数。我们研究了所提出的em风格算法在两个任务(即字素到音素转换和口语数字字符串识别)的判别训练中的可行性。
{"title":"Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs","authors":"G. Heigold, H. Ney, R. Schlüter","doi":"10.1109/TASL.2013.2280234","DOIUrl":"https://doi.org/10.1109/TASL.2013.2280234","url":null,"abstract":"Today's speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). Currently, the optimization is almost always done with (empirical variants of) Extended Baum-Welch (EBW). This type of optimization requires sophisticated update schemes for the step sizes and a considerable amount of parameter tuning, and only little is known about its convergence behavior. In this paper, we derive an EM-style algorithm for discriminative training of HMMs. Like Expectation-Maximization (EM) for the generative training of HMMs, the proposed algorithm improves the training criterion on each iteration, converges to a local optimum, and is completely parameter-free. We investigate the feasibility of the proposed EM-style algorithm for discriminative training of two tasks, namely grapheme-to-phoneme conversion and spoken digit string recognition.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2280234","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Soundfield Imaging in the Ray Space 射线空间中的声场成像
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2274697
D. Markovic, F. Antonacci, A. Sarti, S. Tubaro
In this work we propose a general approach to acoustic scene analysis based on a novel data structure (ray-space image) that encodes the directional plenacoustic function over a line segment (Observation Window, OW). We define and describe a system for acquiring a ray-space image using a microphone array and refer to it as ray-space (or “soundfield”) camera. The method consists of acquiring the pseudo-spectra corresponding to a grid of sampling points over the OW, and remapping them onto the ray space, which parameterizes acoustic paths crossing the OW. The resulting ray-space image displays the information gathered by the sensors in such a way that the elements of the acoustic scene (sources and reflectors) will be easy to discern, recognize and extract. The key advantage of this method is that ray-space images, irrespective of the application, are generated by a common (and highly parallelizable) processing layer, and can be processed using methods coming from the extensive literature of pattern analysis. After defining the ideal ray-space image in terms of the directional plenacoustic function, we show how to acquire it using a microphone array. We also discuss resolution and aliasing issues and show two simple examples of applications of ray-space imaging.
在这项工作中,我们提出了一种基于一种新的数据结构(光线空间图像)的声学场景分析的通用方法,该数据结构对线段上的定向全声函数进行编码(观察窗口,OW)。我们定义并描述了一个使用麦克风阵列获取射线空间图像的系统,并将其称为射线空间(或“声场”)相机。该方法包括获取采样点网格对应的伪光谱,并将其重新映射到射线空间中,从而参数化穿过OW的声路径。由此产生的射线空间图像显示由传感器收集的信息,使声学场景的元素(源和反射器)易于辨别、识别和提取。该方法的主要优点是光线空间图像,无论应用是什么,都是由一个共同的(高度并行化的)处理层生成的,并且可以使用来自大量模式分析文献的方法进行处理。在根据定向全声函数定义理想的射线空间图像后,我们展示了如何使用麦克风阵列获取它。我们还讨论了分辨率和混叠问题,并展示了射线空间成像应用的两个简单例子。
{"title":"Soundfield Imaging in the Ray Space","authors":"D. Markovic, F. Antonacci, A. Sarti, S. Tubaro","doi":"10.1109/TASL.2013.2274697","DOIUrl":"https://doi.org/10.1109/TASL.2013.2274697","url":null,"abstract":"In this work we propose a general approach to acoustic scene analysis based on a novel data structure (ray-space image) that encodes the directional plenacoustic function over a line segment (Observation Window, OW). We define and describe a system for acquiring a ray-space image using a microphone array and refer to it as ray-space (or “soundfield”) camera. The method consists of acquiring the pseudo-spectra corresponding to a grid of sampling points over the OW, and remapping them onto the ray space, which parameterizes acoustic paths crossing the OW. The resulting ray-space image displays the information gathered by the sensors in such a way that the elements of the acoustic scene (sources and reflectors) will be easy to discern, recognize and extract. The key advantage of this method is that ray-space images, irrespective of the application, are generated by a common (and highly parallelizable) processing layer, and can be processed using methods coming from the extensive literature of pattern analysis. After defining the ideal ray-space image in terms of the directional plenacoustic function, we show how to acquire it using a microphone array. We also discuss resolution and aliasing issues and show two simple examples of applications of ray-space imaging.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2274697","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index 基于爆炸指数综合线性预测残差的历元提取
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2273717
A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan
Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting ‘transients’ in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.
Epoch的定义是在浊音的一个音高周期内出现显著兴奋的瞬间。Epoch提取因其在语音分析中的重要意义而不断引起研究人员的兴趣。现有的高性能历元提取算法要么需要动态规划技术,要么需要平均基音周期的先验信息。在此基础上,提出了一种类似于语音源信号的集成线性预测残差(ILPR)算法。采用半波整流和负向ILPR(或ILPR的希尔伯特变换)作为预处理信号。为了检测语音信号中的“瞬态”,提出了一种新的非线性时域测量方法——爆炸指数(PI)。将PI的一种扩展称为动态爆炸指数(DPI)应用于预处理后的信号中来估计时间。使用6个提供同步EGG记录的大型数据库验证了所提出的DPI算法。吱吱声和唱歌的声音样本也进行了分析。该算法在存在加性白噪声和胡言乱语噪声以及模拟电话质量语音的情况下具有鲁棒性。发现DPI算法的性能可与所考虑的五种最先进的实验技术相媲美或更好。
{"title":"Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index","authors":"A. Prathosh, T. Ananthapadmanabha, A. Ramakrishnan","doi":"10.1109/TASL.2013.2273717","DOIUrl":"https://doi.org/10.1109/TASL.2013.2273717","url":null,"abstract":"Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting ‘transients’ in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2273717","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62891683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Body Conducted Speech Enhancement by Equalization and Signal Fusion 基于均衡和信号融合的身体传导语音增强
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2274696
Tomas Dekens, W. Verhelst
This paper studies body-conducted speech for noise robust speech processing purposes. As body-conducted speech is typically limited in bandwidth, signal processing is required to obtain a signal that is both high in quality and low in noise. We propose an algorithm that first equalizes the body-conducted speech using filters obtained from a pre-defined filter set and subsequently fuses this equalized signal with a noisy conventional microphone signal using an optimal clean speech amplitude and phase estimator. We evaluated the proposed equalization and fusion technique using a combination of a conventional close-talk and a throat microphone. Subjective listening tests show that the proposed method successfully fuses the speech quality of the conventional signal and the noise robustness of the throat microphone signal. The listening tests also indicate that the inclusion of the body-conducted signal can improve single-channel speech enhancement methods, while a calculated set of objective signal quality measures confirm these observations.
本文从噪声鲁棒性语音处理的角度对体传导语音进行了研究。由于身体传导的语音通常受带宽限制,因此需要对信号进行处理以获得高质量和低噪声的信号。我们提出了一种算法,该算法首先使用从预定义滤波器集获得的滤波器均衡身体传导的语音,然后使用最佳的干净语音幅度和相位估计器将该均衡信号与有噪声的传统麦克风信号融合。我们评估了采用传统近距离谈话和喉部麦克风相结合的均衡和融合技术。主观聆听测试表明,该方法成功地融合了传统信号的语音质量和喉部传声器信号的噪声鲁棒性。听力测试还表明,身体传导信号的加入可以改善单通道语音增强方法,而一组计算的客观信号质量测量证实了这些观察结果。
{"title":"Body Conducted Speech Enhancement by Equalization and Signal Fusion","authors":"Tomas Dekens, W. Verhelst","doi":"10.1109/TASL.2013.2274696","DOIUrl":"https://doi.org/10.1109/TASL.2013.2274696","url":null,"abstract":"This paper studies body-conducted speech for noise robust speech processing purposes. As body-conducted speech is typically limited in bandwidth, signal processing is required to obtain a signal that is both high in quality and low in noise. We propose an algorithm that first equalizes the body-conducted speech using filters obtained from a pre-defined filter set and subsequently fuses this equalized signal with a noisy conventional microphone signal using an optimal clean speech amplitude and phase estimator. We evaluated the proposed equalization and fusion technique using a combination of a conventional close-talk and a throat microphone. Subjective listening tests show that the proposed method successfully fuses the speech quality of the conventional signal and the noise robustness of the throat microphone signal. The listening tests also indicate that the inclusion of the body-conducted signal can improve single-channel speech enhancement methods, while a calculated set of objective signal quality measures confirm these observations.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2274696","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Declipping of Audio Signals Using Perceptual Compressed Sensing 基于感知压缩感知的音频信号降噪
Pub Date : 2013-12-01 DOI: 10.1109/TASL.2013.2281570
Bruno Defraene, Naim Mansour, S. D. Hertogh, T. Waterschoot, M. Diehl, M. Moonen
The restoration of clipped audio signals, commonly known as declipping, is important to achieve an improved level of audio quality in many audio applications. In this paper, a novel declipping algorithm is presented, jointly based on the theory of compressed sensing (CS) and on well-established properties of human auditory perception. Declipping is formulated as a sparse signal recovery problem using the CS framework. By additionally exploiting knowledge of human auditory perception, a novel perceptual compressed sensing (PCS) framework is devised. A PCS-based declipping algorithm is proposed which uses $ell _{1}$-norm type reconstruction. Comparative objective and subjective evaluation experiments reveal a significant audio quality increase for the proposed PCS-based declipping algorithm compared to CS-based declipping algorithms.
在许多音频应用中,恢复被截断的音频信号(通常称为衰落)对于提高音频质量水平是很重要的。本文基于压缩感知(CS)理论和人类听觉感知特性,提出了一种新的衰落算法。使用CS框架将衰落表述为一个稀疏信号恢复问题。利用听觉感知知识,设计了一种新的感知压缩感知(PCS)框架。提出了一种基于pc的衰落算法,该算法采用$ well _{1}$范数类型重构。对比客观和主观评价实验表明,与基于cs的衰落算法相比,本文提出的基于pcs的衰落算法的音频质量有显著提高。
{"title":"Declipping of Audio Signals Using Perceptual Compressed Sensing","authors":"Bruno Defraene, Naim Mansour, S. D. Hertogh, T. Waterschoot, M. Diehl, M. Moonen","doi":"10.1109/TASL.2013.2281570","DOIUrl":"https://doi.org/10.1109/TASL.2013.2281570","url":null,"abstract":"The restoration of clipped audio signals, commonly known as declipping, is important to achieve an improved level of audio quality in many audio applications. In this paper, a novel declipping algorithm is presented, jointly based on the theory of compressed sensing (CS) and on well-established properties of human auditory perception. Declipping is formulated as a sparse signal recovery problem using the CS framework. By additionally exploiting knowledge of human auditory perception, a novel perceptual compressed sensing (PCS) framework is devised. A PCS-based declipping algorithm is proposed which uses $ell _{1}$-norm type reconstruction. Comparative objective and subjective evaluation experiments reveal a significant audio quality increase for the proposed PCS-based declipping algorithm compared to CS-based declipping algorithms.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2281570","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62892170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
期刊
IEEE Transactions on Audio Speech and Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1