2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文中文

Blind low-complexity estimation of reverberation time 混响时间的盲低复杂度估计

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701875

Christian Schüldt, P. Händel

Real-time blind reverberation time estimation is of interest in speech enhancement techniques such as e.g. dereverberation and microphone beamforming. Advances in this field have been made where the diffusive reverberation tail is modeled and the decay rate is estimated using a maximum-likelihood approach. Various methods for reducing the computational complexity have also been presented. This paper proposes a method for even further computational complexity reduction, by more than 60% in some cases, and it is shown through simulations that the results of the proposed method are very similar to that of the original.

实时盲混响时间估计是语音增强技术如去混响和麦克风波束形成感兴趣的。在这一领域已经取得了进展，其中扩散混响尾模型和衰减率估计使用最大似然方法。还提出了各种降低计算复杂度的方法。本文提出了一种进一步降低计算复杂度的方法，在某些情况下降低了60%以上，通过仿真表明，所提出方法的结果与原始方法非常相似。

引用次数: 4

The influence of informational masking in complex real-world environments 复杂现实环境中信息掩蔽的影响

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701873

Adam Westermann, J. Buchholz

Spatial release from masking (SRM) is believed to be an essential auditory mechanism aiding listeners in reverberant multi-talker environments. However, SRM is often measured in simplified spatial configurations using speech corpora with exaggerated talker and/or context confusions. Besides energetic better-ear listening and binaural unmasking, the perceived spatial separation of target and masking speech signals is thought to aid listener's segregation of speech signals, resulting in a so-called release from informational masking. This study aims to estimate the amount of informational masking that is apparent in complex real-world environments. Speech reception thresholds (SRTs) were measured by presenting Bamford-Kowal-Bench (BKB) sentences in a simulated cafeteria environment recreated by a spherical array of 41 loudspeakers placed in an anechoic chamber. Three maskers with varying degree of informational masking were realized: one with talkers different from the target, one with an unintelligible noise vocoder (minimal informational masking) and one with the same talker as the target (maximum informational masking). The maskers were constructed with either two or seven two-talker conversations and were either spatially distributed in the simulated cafeteria or colocated with the target. Seven normal hearing listeners were tested. All conditions showed improved thresholds for the spatialized condition compared to the colocated condition. However there was no significant difference between the different talker speech and vocoded masker. Only the same talker masker showed increased thresholds and this was only substantial in the two conversation colocated condition. These results suggest that informational masking is of low relevance in real-life listening and is exaggerated in listening tests by target/masker similarities and the colocated spatial configuration. However, this may be different in (aided) hearing impaired listeners where spectral and spatial cues can be significantly disturbed.

空间掩蔽释放(SRM)被认为是混响多语环境中帮助听者的重要听觉机制。然而，SRM通常是在简化的空间配置中测量的，使用的是带有夸张说话者和/或上下文混淆的语音语料库。除了精力充沛的好耳聆听和双耳揭开外，目标和掩蔽语音信号的感知空间分离被认为有助于听者对语音信号的分离，从而导致所谓的从信息掩蔽中释放。本研究旨在估计在复杂的现实世界环境中明显的信息掩蔽量。语音接收阈值(srt)是通过在一个模拟自助餐厅环境中呈现Bamford-Kowal-Bench (BKB)句子来测量的，该环境由放置在消声室中的41个球形扬声器阵列重建。实现了三种不同程度信息掩蔽的掩蔽器:一种是与目标不同的说话者，一种是不可理解的噪声声码器(最小信息掩蔽)，一种是与目标相同的说话者(最大信息掩蔽)。面具由两个或七个双说话者的对话组成，并在空间上分布在模拟的自助餐厅中，或者与目标重合。对7名听力正常的听众进行了测试。所有条件下，空间化条件的阈值都比并置条件有所提高。然而，不同说话者的语音和语音编码掩模之间没有显著差异。只有相同的谈话掩蔽器显示出阈值的增加，这只有在两个谈话同时发生的情况下才明显。这些结果表明，信息掩蔽在现实听力中的相关性较低，并且在听力测试中被目标/掩蔽物的相似性和空间配置所夸大。然而，在(辅助)听力受损的听众中，频谱和空间线索可能会受到严重干扰，这可能会有所不同。

{"title":"The influence of informational masking in complex real-world environments","authors":"Adam Westermann, J. Buchholz","doi":"10.1109/WASPAA.2013.6701873","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701873","url":null,"abstract":"Spatial release from masking (SRM) is believed to be an essential auditory mechanism aiding listeners in reverberant multi-talker environments. However, SRM is often measured in simplified spatial configurations using speech corpora with exaggerated talker and/or context confusions. Besides energetic better-ear listening and binaural unmasking, the perceived spatial separation of target and masking speech signals is thought to aid listener's segregation of speech signals, resulting in a so-called release from informational masking. This study aims to estimate the amount of informational masking that is apparent in complex real-world environments. Speech reception thresholds (SRTs) were measured by presenting Bamford-Kowal-Bench (BKB) sentences in a simulated cafeteria environment recreated by a spherical array of 41 loudspeakers placed in an anechoic chamber. Three maskers with varying degree of informational masking were realized: one with talkers different from the target, one with an unintelligible noise vocoder (minimal informational masking) and one with the same talker as the target (maximum informational masking). The maskers were constructed with either two or seven two-talker conversations and were either spatially distributed in the simulated cafeteria or colocated with the target. Seven normal hearing listeners were tested. All conditions showed improved thresholds for the spatialized condition compared to the colocated condition. However there was no significant difference between the different talker speech and vocoded masker. Only the same talker masker showed increased thresholds and this was only substantial in the two conversation colocated condition. These results suggest that informational masking is of low relevance in real-life listening and is exaggerated in listening tests by target/masker similarities and the colocated spatial configuration. However, this may be different in (aided) hearing impaired listeners where spectral and spatial cues can be significantly disturbed.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114282590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-artifact source separation using probabilistic latent component analysis 使用概率潜在成分分析的低伪影源分离

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701837

N. Mohammadiha, P. Smaragdis, A. Leijon

We propose a method based on the probabilistic latent component analysis (PLCA) in which we use exponential distributions as priors to decrease the activity level of a given basis vector. A straightforward application of this method is when we try to extract a desired source from a mixture with low artifacts. For this purpose, we propose a maximum a posteriori (MAP) approach to identify the common basis vectors between two sources. A low-artifact estimate can now be obtained by using a constraint such that the common basis vectors in the interfering signal's dictionary tend to remain inactive. We discuss applications of this method in source separation with similar-gender speakers and in enhancing a speech signal that is contaminated with babble noise. Our simulations show that the proposed method not only reduces the artifacts but also increases the overall quality of the estimated signal.

我们提出了一种基于概率潜在成分分析(PLCA)的方法，其中我们使用指数分布作为先验来降低给定基向量的活动水平。这种方法的一个直接应用是当我们试图从具有低伪影的混合物中提取所需的源时。为此，我们提出了一种最大后验(MAP)方法来识别两个源之间的公共基向量。现在可以通过使用约束来获得低伪影估计，使得干扰信号字典中的公共基向量趋于保持非活动状态。我们讨论了该方法在具有相似性别说话者的源分离中的应用，以及在被呀呀学噪声污染的语音信号增强中的应用。仿真结果表明，该方法不仅减少了伪影，而且提高了估计信号的整体质量。

引用次数: 7

Closed-form solutions for robust acoustic sensor localization 鲁棒声传感器定位的封闭形式解决方案

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701810

D. B. Haddad, Leonardo O. Nunes, W. Martins, L. Biscainho, Bowon Lee

This paper deals with the localization of acoustic sensors based on signals emitted by loudspeakers at known positions. In particular, a model for distortions in time-of-flight (TOF) estimates applicable to the sensor localization problem is presented along with closed-form solutions with low computational cost. The proposed techniques are able to approximate the sensor position even when the TOFs are corrupted by an unknown delay, there is a sampling frequency mismatch between the A/D and D/A converters associated with sensor and loudspeakers, and the speed of sound is unknown. Simulations and an experiment on real data demonstrate that the proposed methods are able to estimate sensor positions with less than 2 cm of error in the evaluated scenarios.

本文研究了基于扬声器在已知位置发出的信号的声传感器定位问题。特别地，提出了一种适用于传感器定位问题的飞行时间(TOF)畸变估计模型以及计算成本低的封闭形式解。即使tof被未知延迟损坏，与传感器和扬声器相关的a /D和D/ a转换器之间存在采样频率不匹配，以及声速未知，所提出的技术也能够近似传感器位置。仿真和实际数据实验表明，该方法能够在评估场景下估计传感器位置，误差小于2 cm。

引用次数: 9

Rate-distortion optimization for multichannel audio compression 多通道音频压缩的率失真优化

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701839

Minyue Li, J. Skoglund, W. Kleijn

Multichannel audio coding is studied from a rate-distortion theoretical viewpoint. Two practical coding techniques, both of which are based on rate-distortion optimization, are also proposed. The first technique decorrelates a multichannel signal hierarchically using elementary unitary transforms. The second method rearranges a multichannel signal into sub-signals and compresses them at optimized bit rates using a conventional codec. Both objective and subjective tests were conducted to illustrate the efficiency of the methods.

从率失真理论的角度研究了多声道音频编码。本文还提出了两种基于率失真优化的实用编码技术。第一种技术是利用初等酉变换分层地解关联多通道信号。第二种方法将多通道信号重新排列成子信号，并使用传统编解码器以优化的比特率压缩它们。进行了客观和主观测试，以说明该方法的有效性。

引用次数: 0

Spotforming using distributed microphone arrays 使用分布式麦克风阵列形成斑点

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701876

Maja Taseska, Emanuël Habets

Extracting sounds that originate from a specific location, while reducing noise and interferers is required in many hands-free communications systems. We propose a spotforming approach that uses distributed microphone arrays and aims at extracting sounds that originate from a pre-defined spot of interest (SOI), while reducing background noise and sounds that originate from outside the SOI. The spotformer is realized as a linear spatial filter, which is based on the signal statistics of sounds from the SOI, the signal statistics of sounds outside the SOI and the background noise signal statistics. The required signal statistics are estimated from the microphone signals, while taking into account the uncertainty in the location estimates of the desired and the interfering sound sources. The applicability of the method is demonstrated by simulations and the quality of the extracted signal is evaluated in different scenarios.

在许多免提通信系统中，提取来自特定位置的声音，同时减少噪音和干扰。我们提出了一种点形成方法，该方法使用分布式麦克风阵列，旨在提取来自预定义感兴趣点(SOI)的声音，同时减少背景噪声和来自SOI以外的声音。该聚焦器是基于声域内声音的信号统计、声域外声音的信号统计和背景噪声信号统计实现的线性空间滤波器。从麦克风信号中估计所需的信号统计量，同时考虑到期望声源和干扰声源位置估计的不确定性。通过仿真验证了该方法的适用性，并对不同场景下提取的信号质量进行了评价。

引用次数: 16

Evaluating how well filtered white noise models the residual from sinusoidal modeling of musical instrument sounds 评估如何很好地过滤白噪声模型残差从正弦建模的乐器声音

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701840

Marcelo F. Caetano, George P. Kafentzis, G. Degottex, A. Mouchtaris, Y. Stylianou

Nowadays, sinusoidal modeling commonly includes a residual obtained by the subtraction of the sinusoidal model from the original sound. This residual signal is often further modeled as filtered white noise. In this work, we evaluate how well filtered white noise models the residual from sinusoidal modeling of musical instrument sounds for several sinusoidal algorithms. We compare how well each sinusoidal model captures the oscillatory behavior of the partials by looking into how “noisy” their residuals are. We performed a listening test to evaluate the perceptual similarity between the original residual and the modeled counterpart. Then we further investigate whether the result of the listening test can be explained by the fine structure of the residual magnitude spectrum. The results presented here have the potential to subsidize improvements on residual modeling.

目前，正弦建模通常包括正弦模型从原始声音中减去得到的残差。这种残余信号通常被进一步建模为过滤后的白噪声。在这项工作中，我们评估了过滤后的白噪声如何很好地模拟了几种正弦算法中乐器声音正弦建模的残差。我们通过观察残差的“噪声”程度来比较每个正弦模型如何很好地捕捉偏导数的振荡行为。我们进行了一个听力测试，以评估原始残差和模型对应物之间的感知相似性。然后我们进一步研究了残差幅度谱的精细结构是否可以解释听力测试的结果。这里提出的结果有可能资助残差建模的改进。

引用次数: 5

Gentle acoustic crosstalk cancelation using the spectral division method and Ambiophonics 用谱分法和双音法消除温和的声串扰

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701827

J. Ahrens, Mark R. P. Thomas, I. Tashev

We propose the concept of gentle acoustic crosstalk cancelation, which aims at reducing the crosstalk between a loudspeaker and the listener's contralateral ear instead of eliminating it completely as aggressive methods intend to do. The expected benefit is higher robustness and a tendency to collapse less unpleasantly. The proposed method employs a linear loudspeaker array and exhibits two stages: 1) Use the Spectral Division Method to illuminate the ipsilateral ear using constructive interference of the loudspeaker signals. This approach provides only little channel separation between the listener's ears at frequencies below approximately 2000 Hz. 2) There we additionally use destructive interference by Recursive Ambiophonics Crosstalk Elimination (RACE). RACE was chosen because of its tendency to collapse gently. In a sample scenario with realistic parameters, the proposed method achieves around 20 dB of channel separation between 700 Hz and 9000 Hz, which appears to be sufficient to achieve full perceived lateralization when only one ear is illuminated.

我们提出了温和声串扰消除的概念，其目的是减少扬声器和听者对侧耳之间的串扰，而不是像侵略性方法那样完全消除它。预期的好处是更高的稳健性和不那么令人不快的崩溃趋势。该方法采用线性扬声器阵列，分为两个阶段:1)利用扬声器信号的构造干涉，利用谱分法照亮同侧耳。这种方法在大约2000赫兹以下的频率下，听众的耳朵之间只有很小的通道分离。2)在那里，我们还使用了递归双音串扰消除(RACE)的破坏性干扰。选择种族是因为它倾向于温和地崩溃。在具有实际参数的示例场景中，所提出的方法在700 Hz和9000 Hz之间实现了约20 dB的通道分离，这似乎足以在只有一只耳朵被照亮时实现完全感知的侧化。

引用次数: 4

Recurrence quantification analysis features for environmental sound recognition 环境声音识别的递归量化分析特征

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701890

Gerard Roma, Waldo Nogueira, P. Herrera

This paper tackles the problem of feature aggregation for recognition of auditory scenes in unlabeled audio. We describe a new set of descriptors based on Recurrence Quantification Analysis (RQA), which can be extracted from the similarity matrix of a time series of audio descriptors. We analyze their usefulness for environmental audio recognition combined with traditional feature statistics in the context of the AASP D-CASE[1] challenge. Our results show the potential of non-linear time series analysis techniques for dealing with environmental sounds.

本文研究了未标记音频中听觉场景识别的特征聚合问题。本文基于递归量化分析(RQA)，从音频描述符时间序列的相似度矩阵中提取了一组新的描述符。在AASP D-CASE[1]挑战的背景下，我们分析了它们与传统特征统计相结合对环境音频识别的有用性。我们的结果显示了非线性时间序列分析技术在处理环境声音方面的潜力。

引用次数: 52

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701855

J. Kauppinen, Anssi Klapuri, T. Virtanen

Self-similarity matrices have been widely used to analyze the sectional form of music signals, e.g. enabling the detection of parts such as verse and chorus in popular music. Two main types of structures often appear in self-similarity matrices: rectangular blocks of high similarity and diagonal stripes off the main diagonal that represent recurrent sequences. In this paper, we introduce a novel method to model both the block and stripe-like structures in self-similarity matrices and to pull them apart from each other. The model is an extension of the nonnegative matrix factorization, for which we present multiplicative update rules based on the generalized Kullback-Leibler divergence. The modeling power of the proposed method is illustrated with examples, and we demonstrate its application to the detection of sectional boundaries in music.

自相似矩阵已被广泛用于分析音乐信号的分段形式，例如能够检测流行音乐中的主歌和合唱等部分。自相似矩阵中经常出现两种主要类型的结构:高度相似的矩形块和主对角线外的对角线条纹，表示循环序列。在本文中，我们引入了一种新的方法来模拟自相似矩阵中的块状和条状结构，并将它们彼此拉开。该模型是对非负矩阵分解的推广，给出了基于广义Kullback-Leibler散度的乘法更新规则。通过实例说明了该方法的建模能力，并演示了其在音乐剖面边界检测中的应用。

引用次数: 5

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀