2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文中文

Non-negative matrix factorization for irregularly-spaced transforms 不规则空间变换的非负矩阵分解

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701845

P. Smaragdis, Minje Kim

Non-negative factorizations of spectra have been a very popular tool for various audio tasks recently. A long-standing problem with these methods methods is that they cannot be easily applied on other kinds of spectral decompositions such as sinusoidal models, constant-Q transforms, wavelets and reassigned spectra. This is because with these transforms the frequency and/or time values are real-valued and not sampled on a regular grid. We therefore cannot represent them as a matrix that we can later factorize. In this paper we present a formulation of non-negative matrix factorization that can be applied on data with real-valued indices, thereby making the application of this family of methods feasible on a broader family of time/frequency transforms.

频谱的非负分解是近年来各种音频任务中非常流行的一种工具。这些方法的一个长期存在的问题是，它们不能很容易地应用于其他类型的光谱分解，如正弦模型、常q变换、小波和重分配光谱。这是因为使用这些变换，频率和/或时间值是实值，而不是在常规网格上采样。因此，我们不能把它们表示成一个矩阵，以便以后分解。在本文中，我们提出了一个可以应用于实值数据的非负矩阵分解的公式，从而使这类方法在更广泛的时频变换上的应用成为可能。

引用次数: 3

Efficient implementation of the spectral division method for arbitrary virtual sound fields 对任意虚拟声场的频谱划分方法的有效实现

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701828

J. Ahrens, Mark R. P. Thomas, I. Tashev

The Spectral Division Method is an analytic approach for sound field synthesis that determines the loudspeaker driving function in the wavenumber domain. Compact expressions for the driving function in time-frequency domain or in time domain can only be determined for a low number of special cases. Generally, the involved spatial Fourier transforms have to be evaluated numerically. We present a detailed description of the computational procedure and minimize the number of required computations by exploiting the following two aspects: 1) The interval for the spatial sampling of the virtual sound field can be selected for each time-frequency bin, whereby low time-frequency bins can be sampled more coarsely, and 2) the driving function only needs to be evaluated at the locations of the loudspeakers of a given array. The inverse spatial Fourier transform is therefore not required to be evaluated at all initial spatial sampling points but only at those locations that coincide with loudspeakers.

谱分法是声场合成的一种解析方法，它确定了扬声器在波数域中的驱动函数。驱动函数在时频域或时域的紧凑表达式只能在少数特殊情况下确定。通常，所涉及的空间傅里叶变换必须用数值计算。我们详细描述了计算过程，并利用以下两个方面来减少所需的计算次数:1)可以为每个时频箱选择虚拟声场的空间采样间隔，从而可以更粗略地采样低时频箱;2)只需在给定阵列的扬声器位置评估驱动函数。因此，空间傅里叶反变换不需要在所有初始空间采样点进行评估，而只需要在与扬声器重合的那些位置进行评估。

引用次数: 6

Effect of higher-order ambisonics on evaluating beamformer benefit in realistic acoustic environments 高阶双声学对实际声环境中波束形成器效益评价的影响

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701882

Chris Oreinos, J. Buchholz, Jorge Mejia

Multi-channel loudspeaker systems have been proposed to assess the real-life benefit of devices such as hearing aids, cochlear implants, or mobile phones. This paper investigates to what extent sound fields recreated by Higher-Order Ambisonics (HOA) can be used to evaluate the performance of spatially selective multi-microphone processing schemes (beamformers) inside complex acoustic environments. Two example schemes are considered: an adaptive directional microphone (ADM) and a contralateral suppression bilateral beamformer (BBF), both implemented in the context of a hearing aid device. The acoustic scenarios consist of a single speech target (0°) competing against three speech jammers (±90° and 180°) set either in an anechoic or in a reverberant simulated classroom (T30 = 0.6s). The HOA effect on the directional algorithm performance is quantified through: (a) the adaptive, frequency-dependent, algorithm gains, (b) the SNR improvement calculated in one-third octave bands, and (c) the processed target frequency response. The HOA reconstruction errors influence the beamformers in mainly two ways; first, by altering the spatial characteristics of the sound field, which in turn modifies the adaptation of the algorithms, and second, by affecting the spectral content of the sources. The results suggest that although HOA (here 7th order) does not degrade the broadband, long-term, intelligibility-weighted SNR improvement of the two beamformers, it imposes a low-pass effect on the processed target. This renders the HOA coding problematic above the system's cut-off frequency.

多声道扬声器系统已被提议用于评估诸如助听器、人工耳蜗或移动电话等设备的实际效益。本文研究了高阶双声系统(HOA)重建的声场在多大程度上可以用来评估复杂声环境中空间选择性多传声器处理方案(波束形成器)的性能。本文考虑了两种方案:自适应定向麦克风(ADM)和对侧抑制双侧波束形成器(BBF)，这两种方案都是在助听器中实现的。声学场景包括单个语音目标(0°)与三个语音干扰器(±90°和180°)竞争，这些干扰器设置在消声或混响模拟教室中(T30 = 0.6s)。HOA对定向算法性能的影响通过以下方式进行量化:(a)自适应、频率相关的算法增益，(b)三分之一倍频带内计算的信噪比改进，以及(c)处理后的目标频率响应。HOA重构误差主要从两个方面影响波束形成器;首先，通过改变声场的空间特征，从而改变算法的适应性;其次，通过影响声源的频谱含量。结果表明，尽管HOA(这里是7阶)不会降低两种波束形成器的宽带、长期、智能加权信噪比的提高，但它对处理后的目标施加了低通效应。这使得HOA编码在系统截止频率以上出现问题。

{"title":"Effect of higher-order ambisonics on evaluating beamformer benefit in realistic acoustic environments","authors":"Chris Oreinos, J. Buchholz, Jorge Mejia","doi":"10.1109/WASPAA.2013.6701882","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701882","url":null,"abstract":"Multi-channel loudspeaker systems have been proposed to assess the real-life benefit of devices such as hearing aids, cochlear implants, or mobile phones. This paper investigates to what extent sound fields recreated by Higher-Order Ambisonics (HOA) can be used to evaluate the performance of spatially selective multi-microphone processing schemes (beamformers) inside complex acoustic environments. Two example schemes are considered: an adaptive directional microphone (ADM) and a contralateral suppression bilateral beamformer (BBF), both implemented in the context of a hearing aid device. The acoustic scenarios consist of a single speech target (0°) competing against three speech jammers (±90° and 180°) set either in an anechoic or in a reverberant simulated classroom (T30 = 0.6s). The HOA effect on the directional algorithm performance is quantified through: (a) the adaptive, frequency-dependent, algorithm gains, (b) the SNR improvement calculated in one-third octave bands, and (c) the processed target frequency response. The HOA reconstruction errors influence the beamformers in mainly two ways; first, by altering the spatial characteristics of the sound field, which in turn modifies the adaptation of the algorithms, and second, by affecting the spectral content of the sources. The results suggest that although HOA (here 7th order) does not degrade the broadband, long-term, intelligibility-weighted SNR improvement of the two beamformers, it imposes a low-pass effect on the processed target. This renders the HOA coding problematic above the system's cut-off frequency.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127939975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Evaluation of spectral transforms for music signal analysis 音乐信号分析中频谱变换的评价

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701843

A. Nagathil, Rainer Martin

In this paper we present a study on the spectral analysis of music signals comparing the time domain representation, the short-time Fourier transform (STFT) and the constant-Q transform (CQT) which are additionally combined with different signal-dependent transforms. The comparison is carried out with respect to the spectral compactness, the data compression ability and the temporal continuity of transform coefficients for which we propose measures in this paper. In addition, we investigate the performance of these transforms in a source separation task in which we strive for recovering the main melody line from a mixed instrument recording. Our experiments reveal that performing a rank-reduced principal component analysis based on a CQT representation exhibits the best results in terms of instrumental source separation measures and listening impression which points towards the potential of the CQT for improving existing source separation methods which are currently often based on the STFT.

本文对音乐信号的频谱分析进行了研究，比较了时域表示、短时傅立叶变换(STFT)和常q变换(CQT)，并结合了不同的信号相关变换。本文从频谱紧密度、数据压缩能力和变换系数的时间连续性三个方面对本文提出的措施进行了比较。此外，我们研究了这些转换在源分离任务中的性能，我们努力从混合乐器记录中恢复主旋律线。我们的实验表明，基于CQT表示进行秩降主成分分析在仪器源分离措施和听音印象方面表现出最佳结果，这表明CQT在改进现有的基于STFT的源分离方法方面具有潜力。

引用次数: 3

Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition 语音增强的稀疏，低秩，字典谱图分解

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701883

Zhuo Chen, D. Ellis

Speech enhancement requires some principle by which to distinguish speech and noise, and the most successful separation requires strong models for both speech and noise. If, however, the noise encountered differs significantly from the system's assumptions, performance will suffer. In this work, we propose a novel speech enhancement system based on decomposing the spectrogram into sparse activation of a dictionary of target speech templates, and a low-rank background model, which makes few assumptions about the noise other than its limited spectral variation. A variation of this model specifically designed to handle transient noise intrusions is also proposed. Evaluation via BSS EVAL and PESQ show that the new approaches improve signal-to-distortion ratio in most cases and PESQ in high-noise conditions when compared to several traditional speech enhancement algorithms including log-MMSE.

语音增强需要一些区分语音和噪声的原则，而最成功的分离需要对语音和噪声都有强大的模型。但是，如果遇到的噪声与系统的假设显著不同，则性能将受到影响。在这项工作中，我们提出了一种新的语音增强系统，该系统基于将频谱图分解为目标语音模板字典的稀疏激活和低秩背景模型，该模型除了有限的频谱变化外，对噪声几乎没有任何假设。本文还提出了该模型的一种变体，专门用于处理瞬态噪声入侵。通过BSS EVAL和PESQ进行的评估表明，与包括对数mmse在内的几种传统语音增强算法相比，新方法在大多数情况下提高了信失真比，在高噪声条件下提高了PESQ。

引用次数: 50

Modulation filtering for structured time-frequency estimation of audio signals 用于音频信号结构化时频估计的调制滤波

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701863

Kai Siedenburg, P. Depalle

This paper considers the estimation of time-frequency coefficients of audio signals from the viewpoint of spectro-temporal modulation analysis. It is shown that estimators employing neighborhood-smoothed shrinkage masks are closely related to modulation filters. The usefulness of this perspective is first demonstrated by separating an artificial mixture of components with different orientation in time-frequency. It is secondly shown that modulation filters can be learned directly from audio and that their usage improves the state of the art in noise-reduction by a small margin when measured by signal to noise ratio.

本文从频谱-时间调制分析的角度考虑了音频信号时频系数的估计问题。结果表明，采用邻域平滑收缩掩模的估计器与调制滤波器密切相关。这种观点的有用性首先通过在时间频率上分离具有不同方向的成分的人工混合物来证明。其次，调制滤波器可以直接从音频中学习，并且当用信噪比测量时，调制滤波器的使用在一定程度上提高了当前技术的降噪水平。

引用次数: 2

The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech 混响挑战：消除混响和识别混响语音的通用评估框架

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701894

K. Kinoshita, Marc Delcroix, Takuya Yoshioka, T. Nakatani, A. Sehr, Walter Kellermann, R. Maas

Recently, substantial progress has been made in the field of reverberant speech signal processing, including both single- and multichannel dereverberation techniques, and automatic speech recognition (ASR) techniques robust to reverberation. To evaluate state-of-the-art algorithms and obtain new insights regarding potential future research directions, we propose a common evaluation framework including datasets, tasks, and evaluation metrics for both speech enhancement and ASR techniques. The proposed framework will be used as a common basis for the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. This paper describes the rationale behind the challenge, and provides a detailed description of the evaluation framework and benchmark results.

最近，混响语音信号处理领域取得了重大进展，包括单通道和多通道去混响技术，以及对混响具有鲁棒性的自动语音识别（ASR）技术。为了评估最先进的算法并获得有关未来潜在研究方向的新见解，我们提出了一个通用评估框架，其中包括语音增强和自动语音识别技术的数据集、任务和评估指标。提出的框架将作为 REVERB（混响语音增强和识别基准）挑战赛的共同基础。本文介绍了挑战赛背后的原理，并详细描述了评估框架和基准结果。

引用次数: 377

A hybrid LF-Rosenberg frequency-domain model of the glottal pulse 声门脉冲的混合LF-Rosenberg频域模型

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701892

Sandra Dias, Aníbal J. S. Ferreira

In this paper we describe innovative advances to the design of a new frequency-domain algorithm to glottal source estimation whose conceptual approach we have reported recently [1]. Those advances result from accurate sinusoidal/harmonic analysis and synthesis of two concomitant acoustic signals: the glottal source signal captured near the vocal folds, and the corresponding voiced signal captured outside the mouth. We describe the experimental procedure which was performed by an ORL specialist using a rigid video-laryngoscope and two tiny and high-quality microphones. Six subjects have participated in the tests and records were made for vowels /a/ and /i/. The data analysis allowed us to conclude on the magnitude and on the phase-related NRD features of the glottal source signal. In addition, a new frequency-domain glottal pulse model combining features of the Liljencrants-Fant and Rosenberg models has been devised that is a better match to the observed data. The derivatives of the three models are obtained using accurate frequency-domain processing. The paper concludes with next research steps.

在本文中，我们描述了一种新的频域声门源估计算法设计的创新进展，其概念方法我们最近已经报道过[1]。这些进步源于精确的正弦/谐波分析和两种伴随声信号的合成:在声带附近捕获的声门源信号，以及在口腔外捕获的相应的浊音信号。我们描述了由ORL专家使用刚性视频喉镜和两个小而高质量的麦克风进行的实验过程。6名受试者参加了测试，并对元音/a/和/i/进行了记录。数据分析使我们能够得出声门源信号的幅度和相位相关的NRD特征。此外，结合Liljencrants-Fant和Rosenberg模型的特点，设计了一个新的频域声门脉冲模型，该模型与观测数据更匹配。通过精确的频域处理，得到了三种模型的导数。论文最后提出了下一步的研究方向。

引用次数: 4

Deconvolution of plenacoustic images 全声图像的反卷积

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701850

Lucio Bianchi, Dejan Markovic, F. Antonacci, A. Sarti, S. Tubaro

In this paper we propose a methodology aimed at improving the resolution capabilities of plenacoustic imaging, which is based on deconvolution techniques mutuated from aerospace acoustic imaging. In order to reduce the computational burden, we also propose a modification of the minimization problem that exploits the highly structured information contained in the plenacoustic image. Experiments and simulations show the improvement of the accuracy gained by applying the deconvolution operator.

本文提出了一种基于航空声学成像反卷积技术的全声成像方法，旨在提高全声成像的分辨率。为了减少计算负担，我们还提出了一种利用全声图像中包含的高度结构化信息的最小化问题的修改。实验和仿真结果表明，采用反褶积算子可以提高检测精度。

引用次数: 9

Modeling early reflections of room impulse responses using a radiance transfer method 利用辐射转移法模拟房间脉冲响应的早期反射

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701841

Hequn Bai, G. Richard, L. Daudet

In this paper we propose an extension for the Acoustic Radiance Transfer (ART) method for the modeling of room acoustics. The original ART method is very efficient for modeling diffuse reflections and the late reverberation but does not well represent the early echoes. We then propose, in this paper, an extension of the ART method which allows to model the early part while keeping the advantages of the original method for the late reverberation simulation. The experimental results confirm that the proposed method gives more accurate reconstruction of the early reflections than the traditional ART method in average and that comparable accuracy can be obtained at lower complexity and memory requirements than the traditional ART method.

在本文中，我们提出了声学辐射传输(ART)方法在室内声学建模中的扩展。原有的ART方法对漫反射和后期混响的建模非常有效，但对早期回波的模拟效果较差。然后，我们在本文中提出了ART方法的扩展，该方法允许对早期部分进行建模，同时保留了原始方法在后期混响模拟中的优点。实验结果表明，该方法对早期反射的平均重建精度高于传统的ART方法，并且在较低的复杂度和内存要求下获得与传统ART方法相当的精度。

引用次数: 4

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀