2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文中文

Using articulation index band correlations to objectively estimate speech intelligibility consistent with the modified rhyme test 利用发音指标频带相关性客观评价与修正韵脚测试相一致的语音可理解性

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-23 DOI: 10.1109/WASPAA.2013.6701826

S. Voran

We present an objective estimator of speech intelligibility that follows the paradigm of the Modified Rhyme Test (MRT). For each input, the estimator uses temporal correlations within articulation index bands to select one of six possible words from a list. The rate of successful word identification becomes the measure of speech intelligibility, as in the MRT. The estimator is called Articulation Band Correlation MRT (ABC-MRT). It consumes a tiny fraction of the resources required by MRT testing. ABC-MRT has been tested on a wide range of impaired speech recordings unseen during development. The resulting Pearson correlations between ABC-MRT and MRT results range from .95 to .99. These values exceed those of the other estimators tested.

我们提出了一种客观的语音可理解性评估方法，它遵循了修饰韵测试(MRT)的范式。对于每个输入，估计器使用发音索引带内的时间相关性从列表中选择六个可能的单词之一。单词识别的成功率成为衡量语音可理解性的标准，就像在MRT中一样。该估计器称为Articulation Band Correlation MRT (ABC-MRT)。它只消耗MRT测试所需资源的一小部分。ABC-MRT已在开发过程中未见过的各种受损语音记录上进行了测试。ABC-MRT和MRT结果之间的Pearson相关性在0.95到0.99之间。这些值超过了测试过的其他估计值。

引用次数: 10

Roomprints for forensic audio applications 用于法医音频应用的Roomprints

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-20 DOI: 10.1109/WASPAA.2013.6701854

Alastair H. Moore, M. Brookes, P. Naylor

A roomprint is a quantifiable description of an acoustic environment which can be measured under controlled conditions and estimated from a monophonic recording made in that space. We here identify the properties required of a roomprint in forensic audio applications and review the observable characteristics of a room that, when extracted from recordings, could form the basis of a room-print. Frequency-dependent reverberation time is investigated as a promising characteristic and used in a room identification experiment giving correct identification in 96% of trials.

房间印记是对声音环境的可量化描述，可以在受控条件下测量，并根据在该空间中录制的单声道录音进行估计。我们在这里确定了法医音频应用中房间指纹所需的属性，并回顾了从录音中提取的房间的可观察特征，这些特征可以形成房间指纹的基础。频率相关混响时间作为一种有前景的特征进行了研究，并用于室内识别实验，正确率达96%。

引用次数: 12

The geometry of sound-source localization using non-coplanar microphone arrays 使用非共面传声器阵列的声源定位几何

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-20 DOI: 10.1109/WASPAA.2013.6701896

Xavier Alameda-Pineda, R. Horaud, B. Mourrain

This paper addresses the task of sound-source localization from time delay estimates using arbitrarily shaped non-coplanar microphone arrays. We fully exploit the direct path propagation model and our contribution is threefold: we provide a necessary and sufficient condition for a set of time delays to correspond to a sound source position, a proof of the uniqueness of this position, and a localization mapping to retrieve it. The time delay estimation task is casted into a non-linear multivariate optimization problem constrained by necessary and sufficient conditions on time delays. Two global optimization techniques to estimate time delays and localize the sound source are investigated. We report an extensive set of experiments and comparisons with state-of-the-art methods on simulated and real data in the presence of noise and reverberations.

本文讨论了使用任意形状的非共面传声器阵列从时间延迟估计中定位声源的任务。我们充分利用了直接路径传播模型，我们的贡献有三个方面:我们提供了一组时间延迟对应于声源位置的充分必要条件，证明了该位置的唯一性，以及检索它的定位映射。将时滞估计任务转化为一个受时滞充要条件约束的非线性多元优化问题。研究了两种用于估计时延和定位声源的全局优化技术。我们报告了一组广泛的实验，并与最先进的方法在噪声和混响存在下的模拟和真实数据进行了比较。

引用次数: 2

Sound acquisition in noisy and reverberant environments using virtual microphones 使用虚拟麦克风在嘈杂和混响环境中进行声音采集

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701869

K. Kowalczyk, O. Thiergart, A. Craciun, Emanuël Habets

In hands-free communication applications, the main goal is to capture desired sounds, while reducing noise and interfering sounds. However, for natural-sounding telepresence systems, the spatial sound image should also be preserved. Using a recently proposed method for generating the signal of a virtual microphone (VM), one can recreate the sound image from an arbitrary point of view in the sound scene (e.g., close to a desired speaker), while being able to place the physical microphones outside the sound scene. In this paper, we present a method for synthesizing a VM signal in noisy and reverberant environments, where the estimation of the required direct and diffuse sound components is performed using two multichannel linear filters. The direct sound component is estimated using a multichannel Wiener filter, while the diffuse sound component is estimated using a linearly constrained minimum variance filter followed by a single-channel Wiener filter. Simulations in a noisy and reverberant environment show the applicability of the proposed method for sound acquisition in a scenario in which two microphone arrays are installed in a large TV.

在免提通信应用中，主要目标是捕获所需的声音，同时减少噪音和干扰声音。然而，对于自然声音的远程呈现系统，也应该保留空间声音图像。使用最近提出的生成虚拟麦克风(VM)信号的方法，可以从声音场景中的任意角度(例如，靠近所需扬声器)重新创建声音图像，同时能够将物理麦克风放置在声音场景之外。在本文中，我们提出了一种在嘈杂和混响环境中合成虚拟机信号的方法，其中使用两个多通道线性滤波器对所需的直接和漫射声音分量进行估计。使用多通道维纳滤波器估计直接声音分量，而使用线性约束最小方差滤波器和单通道维纳滤波器估计漫射声音分量。在嘈杂和混响环境下的仿真表明，该方法适用于大型电视中安装两个麦克风阵列的声音采集场景。

{"title":"Sound acquisition in noisy and reverberant environments using virtual microphones","authors":"K. Kowalczyk, O. Thiergart, A. Craciun, Emanuël Habets","doi":"10.1109/WASPAA.2013.6701869","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701869","url":null,"abstract":"In hands-free communication applications, the main goal is to capture desired sounds, while reducing noise and interfering sounds. However, for natural-sounding telepresence systems, the spatial sound image should also be preserved. Using a recently proposed method for generating the signal of a virtual microphone (VM), one can recreate the sound image from an arbitrary point of view in the sound scene (e.g., close to a desired speaker), while being able to place the physical microphones outside the sound scene. In this paper, we present a method for synthesizing a VM signal in noisy and reverberant environments, where the estimation of the required direct and diffuse sound components is performed using two multichannel linear filters. The direct sound component is estimated using a multichannel Wiener filter, while the diffuse sound component is estimated using a linearly constrained minimum variance filter followed by a single-channel Wiener filter. Simulations in a noisy and reverberant environment show the applicability of the proposed method for sound acquisition in a scenario in which two microphone arrays are installed in a large TV.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115134600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Broadband sensor location selection using convex optimization in very large scale arrays 基于凸优化的超大规模阵列宽带传感器位置选择

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701889

Y. Lai, R. Balan, Heiko Claussen, J. Rosca

Consider a sensing system using a large number of N microphones placed in multiple dimensions to monitor a broadband acoustic field. Using all the microphones at once is impractical because of the amount of data generated. Instead, we choose a subset of D microphones to be active. Specifically, we wish to find the set of D microphones that minimizes the largest interference gain at multiple frequencies while monitoring a target of interest. A direct, combinatorial approach - testing all N choose D subsets of microphones - is impractical because of the problem size. Instead, we use a convex optimization technique that induces sparsity through a l1-penalty to determine which subset of microphones to use. We test the robustness of the our solution through simulated annealing and compare its performance against a classical beamformer which maximizes SNR. Since switching from a subset of D microphones to another subset of D microphones at every sample is possible, we construct a space-time-frequency sampling scheme that achieves near optimal performance.

考虑一个传感系统，在多个维度上放置大量N个麦克风来监测宽带声场。同时使用所有的麦克风是不切实际的，因为会产生大量的数据。相反，我们选择D麦克风的一个子集来激活。具体来说，我们希望找到一组D麦克风，在监测感兴趣的目标时，在多个频率下使最大干扰增益最小化。直接的组合方法——测试所有N个选择D个麦克风子集——由于问题的大小是不切实际的。相反，我们使用凸优化技术，该技术通过11惩罚来诱导稀疏性，以确定使用哪个麦克风子集。我们通过模拟退火测试了我们的解决方案的鲁棒性，并将其性能与最大化信噪比的经典波束形成器进行了比较。由于在每个采样点从D个麦克风子集切换到另一个D个麦克风子集是可能的，因此我们构建了一个实现接近最佳性能的空时频采样方案。

引用次数: 3

A fast Griffin-Lim algorithm 一种快速Griffin-Lim算法

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701851

Nathanael Perraudin, P. Balázs, P. Søndergaard

In this paper, we present a new algorithm to estimate a signal from its short-time Fourier transform modulus (STFTM). This algorithm is computationally simple and is obtained by an acceleration of the well-known Griffin-Lim algorithm (GLA). Before deriving the algorithm, we will give a new interpretation of the GLA and formulate the phase recovery problem in an optimization form. We then present some experimental results where the new algorithm is tested on various signals. It shows not only significant improvement in speed of convergence but it does as well recover the signals with a smaller error than the traditional GLA.

本文提出了一种从短时傅里叶变换模量(STFTM)估计信号的新算法。该算法计算简单，是通过对著名的Griffin-Lim算法(GLA)的加速得到的。在推导算法之前，我们将对GLA进行新的解释，并将相位恢复问题以优化形式表达出来。然后，我们给出了一些实验结果，其中新算法在各种信号上进行了测试。它不仅在收敛速度上有了显著的提高，而且在恢复信号时的误差也比传统的GLA小。

引用次数: 128

Hierarchical modeling using automated sub-clustering for sound event recognition 基于自动子聚类的声音事件识别分层建模

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701862

M. Niessen, T. V. Kasteren, A. Merentitis

The automatic recognition of sound events allows for novel applications in areas such as security, mobile and multimedia. In this work we present a hierarchical hidden Markov model for sound event detection that automatically clusters the inherent structure of the events into sub-events. We evaluate our approach on an IEEE audio challenge dataset consisting of office sound events and provide a systematic comparison of the various building blocks of our approach to demonstrate the effectiveness of incorporating certain dependencies in the model. The hierarchical hidden Markov model achieves an average frame-based F-measure recognition performance of 45.5% on a test dataset that was used to evaluate challenge submissions. We also show how the hierarchical model can be used as a meta-classifier, although in the particular application this did not lead to an increase in performance on the test dataset.

声音事件的自动识别允许在安全、移动和多媒体等领域的新应用。在这项工作中，我们提出了一种用于声音事件检测的分层隐马尔可夫模型，该模型自动将事件的固有结构聚类成子事件。我们在由办公室声音事件组成的IEEE音频挑战数据集上评估了我们的方法，并提供了我们方法的各种构建块的系统比较，以证明在模型中合并某些依赖关系的有效性。在用于评估挑战提交的测试数据集上，分层隐马尔可夫模型实现了平均45.5%的基于帧的f测度识别性能。我们还展示了如何将分层模型用作元分类器，尽管在特定的应用程序中，这并没有导致测试数据集上性能的提高。

引用次数: 24

Estimation of room dimensions from a single impulse response 从单个脉冲响应估计房间尺寸

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701867

Dejan Markovic, F. Antonacci, A. Sarti, S. Tubaro

In this paper we propose a methodology for the estimation of the geometry of an environment based on a single Acoustic Impulse Response (AIR). The estimation algorithm makes use of tools for the modeling of propagation based on geometrical acoustics. A suitable cost function evaluates the distance between the simulated and measured AIRs. The room minimizing the cost function is chosen as the correct one. The cost function is strongly non linear. As a consequence, in order to reduce the complexity of the minimization problem, the algorithm needs a hypothesis about the class of geometry of the environment under analysis, such as rectangular or L-shaped rooms. We prove the effectiveness of the proposed algorithm with a number of simulations with increasing complexity.

在本文中，我们提出了一种基于单一声脉冲响应(AIR)估计环境几何形状的方法。该估计算法利用了基于几何声学的传播建模工具。一个合适的代价函数评估模拟和测量空气之间的距离。选择成本函数最小的房间作为正确的房间。成本函数是强非线性的。因此，为了降低最小化问题的复杂性，该算法需要一个关于被分析环境的几何类型的假设，例如矩形或l形房间。我们用越来越复杂的仿真证明了该算法的有效性。

引用次数: 14

Sparse representation and epoch estimation of voiced speech 浊音语音的稀疏表示与epoch估计

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701885

J. Gunther, T. Moon

Whereas most approaches to linear speech prediction fail to account for the quasi-periodic glottal flow, this paper incorporates a model for the glottal flow derivative (GFD) directly into the linear prediction problem. A linear model for the prediction error is obtained by constructing a dictionary of time-shifted GFD pulses. The pulses are constructed by applying glottal inverse filtering (GIF) to recorded speech. Minimizing the difference between the linear prediction residual and a sparse combination of the pulses in the dictionary leads to joint estimation of the linear predictor as well as a sparse representation for the prediction error that reveals the instants of vocal tract excitation (epochs). The method is applied to voiced segments extracted from the CMU Arctic dataset which also includes electro-glottograms. Results show that the proposed method is effective in estimating the parameters of interest and that GIF-based pulses more accurately model GFD pulses occurring in real speech than pulses computed using the mathematical models.

鉴于大多数线性语音预测方法无法考虑准周期性声门流动，本文将声门流动导数(GFD)模型直接纳入线性预测问题。通过构造时移GFD脉冲字典，得到了预测误差的线性模型。这些脉冲是通过对录制的语音进行声门反滤波(GIF)来构造的。最小化线性预测残差和字典中脉冲的稀疏组合之间的差异导致线性预测器的联合估计以及预测误差的稀疏表示，该预测误差揭示了声道兴奋的瞬间(epoch)。将该方法应用于从CMU北极数据集中提取的语音片段，该数据集还包括电声门图。结果表明，该方法可以有效地估计目标参数，并且基于gif的脉冲比使用数学模型计算的脉冲更准确地模拟真实语音中发生的GFD脉冲。

引用次数: 0

Multichannel HR-NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain 多通道HR-NMF在时频域模拟非平稳信号的卷积混合

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701824

R. Badeau, Mark D. Plumbley

Several probabilistic models involving latent components have been proposed for modelling time-frequency (TF) representations of audio signals (such as spectrograms), notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to a stereophonic piano signal, and proves capable of accurately modelling reverberation and restoring missing observations.

在非负矩阵分解(NMF)的文献中，已经提出了几个涉及潜在分量的概率模型，用于模拟音频信号(如频谱图)的时频(TF)表示。其中，最近的高分辨率NMF (HR-NMF)模型能够同时考虑每个频带的相位和局部相关性，其潜力已在源分离和音频修复等应用中得到证明。本文将HR-NMF扩展到多通道信号和卷积混合信号。提出了一种快速变分期望最大化(EM)算法对增强模型进行估计。将该算法应用于一个立体声钢琴信号，证明了该算法能够准确地模拟混响并恢复缺失的观测值。

引用次数: 7

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀