2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文中文

Modeling nonlinear circuits with linearized dynamical models via kernel regression 用核回归的线性化动力学模型建模非线性电路

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701830

Daniel J. Gillespie, D. Ellis

This paper introduces a novel method for the solution of guitar distortion circuits based on the use of kernels. The proposed algorithm uses a kernel regression framework to linearize the inherent nonlinear dynamical systems created by such circuits and proposes data and kernel selection algorithms well suited to learn the required regression parameters. Examples are presented using the One Capacitor Diode Clipper and the Common-Cathode Tube Amplifier.

本文介绍了一种利用核函数求解吉他失真电路的新方法。该算法使用核回归框架对此类电路产生的固有非线性动力系统进行线性化，并提出适合学习所需回归参数的数据和核选择算法。介绍了单电容二极管裁剪器和共阴极管放大器的应用实例。

引用次数: 7

Sine-wave based PSOLA pitch scaling with real-time pitch marking 基于正弦波的PSOLA基音缩放与实时基音标记

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701864

R. McAulay

The sinusoidal system was reconfigured to use pitch synchronous overlap-add (PSOLA) synthesis so that pitch shifting could be achieved by moving the sine-wave parameters to the pitch-shifted synthesis frames. This, in turn, led to a pitch-marking technique based on the sine-wave phases that required no forward-backward searching for epochs, resulting in real-time pitch scaling. Having access to the sine wave amplitudes led to realistic re-shaping of the vocal tract characteristic, hence the system is well suited for real-time pitch scaling and vocal tract modification of speech.

将正弦系统重新配置为使用基音同步叠加(PSOLA)合成，通过将正弦波参数移动到基音移位合成帧中来实现基音移位。这反过来又导致了一种基于正弦波相位的音高标记技术，这种技术不需要向前向后搜索epoch，从而实现了实时音高缩放。通过对正弦波振幅的访问，可以对声道特征进行真实的重塑，因此该系统非常适合于语音的实时音高缩放和声道修改。

引用次数: 3

Optimizing frame analysis with non-integrer shift for sampling mismatch compensation of long recording 长记录采样失配补偿的非整移优化帧分析

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701833

S. Miyabe, Nobutaka Ono, S. Makino

This paper proposes a blind synchronization of ad-hoc microphone array in the short-time Fourier transform (STFT) domain with the optimized frame analysis centered at non-integer discrete time. We show that the drift caused by sampling frequency mismatch of asynchronous observation channels can be disregarded in a short interval. Utilizing this property, the sampling frequency mismatch and the recording start offset are estimated roughly by finding two pairs of the short intervals corresponding to the same continuous time. Using the estimate, STFT analysis is synchronized roughly between channels with optimized frame central. Since the optimized frame central is generally non-integer, we approximate the frame analysis by the linear phase filtering of the frame centered at the nearest integer sample. Maximum likelihood estimation refines the compensation of sampling frequency mismatch.

提出了一种以非整数离散时间为中心的优化帧分析方法，在短时傅里叶变换(STFT)域中实现自组网麦克风阵列的盲同步。结果表明，异步观测信道的采样频率不匹配引起的漂移可以在短时间内忽略。利用这一特性，通过寻找对应于相同连续时间的两对短间隔，粗略估计采样频率失配和记录开始偏移。利用估计，STFT分析在优化帧中心的信道之间大致同步。由于优化后的帧中心一般是非整数，我们通过对最接近整数样本中心的帧进行线性相位滤波来近似分析帧。极大似然估计改进了采样频率失配的补偿。

引用次数: 15

An efficient time-varying loudness model 一种有效的时变响度模型

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701884

D. Ward, C. Athwal, M. Köküer

In this paper, we present an efficient loudness model applicable to time-varying sounds. We use the model of Glasberg and Moore (J. Audio Eng. Soc., 2002) as the basis for our developments, proposing a number of optimization techniques to reduce the computational complexity at each stage of the model. Efficient alternatives to computing the multi-resolution DFT, excitation pattern and pre-cochlea filter are presented. Absolute threshold and equal loudness contour predictions are computed and compared against both steady-state and time-varying loudness models to evaluate the combined accuracy of these techniques in the frequency domain. Finally, computational costs and loudness errors are quantified for a range of time-varying stimuli, demonstrating that the optimized model can execute approximately 50 times faster within tolerable error bounds.

本文提出了一种适用于时变声音的有效响度模型。我们使用了Glasberg和Moore (J. Audio Eng)的模型。Soc。， 2002)作为我们发展的基础，提出了许多优化技术来降低模型每个阶段的计算复杂性。提出了计算多分辨率DFT、激励模式和前耳蜗滤波器的有效替代方法。计算绝对阈值和等响度轮廓预测，并与稳态和时变响度模型进行比较，以评估这些技术在频域的组合精度。最后，对一系列时变刺激的计算成本和响度误差进行了量化，结果表明，在可容忍的误差范围内，优化模型的执行速度可以提高约50倍。

引用次数: 3

Map estimation of driving signals of loudspeakers for sound field reproduction from pressure measurements 从压力测量中再现声场用扬声器驱动信号的图估计

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701895

Shoichi Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda

Sound field reproduction methods calculate driving signals of loudspeakers to reproduce the desired sound field. In common recording and reproduction systems, sound pressures at multiple positions obtained in a recording room are only known as the desired sound field; therefore, signal transformation algorithms from sound pressures into driving signals (SP-DS conversion) are necessary. Although several SP-DS conversion methods have been proposed, they do not take into account a priori information about the recorded sound field. However, approximate positions of sound sources can be obtained by using the received signals of microphones or other sensor data. We propose an SP-DS conversion method based on the maximum a posteriori (MAP) estimation when array configurations of the microphones and loudspeakers are planar or linear. The optimal basis functions and their coefficients for representing driving signals of the loudspeakers are optimized based on the prior information of the source positions. Numerical simulation results indicate that the proposed method can achieve higher reproduction accuracy compared to the current SP-DS conversion methods, especially in higher frequencies above the spatial Nyquist frequency.

声场再现方法计算扬声器的驱动信号以再现所需的声场。在普通的录音和重放系统中，在录音室中获得的多个位置的声压仅称为所需声场;因此，从声压到驱动信号的信号转换算法(SP-DS转换)是必要的。虽然已经提出了几种SP-DS转换方法，但它们都没有考虑到有关录制声场的先验信息。然而，声源的大致位置可以通过使用麦克风的接收信号或其他传感器数据来获得。提出了一种基于最大后验估计(MAP)的麦克风和扬声器阵列配置为平面或线性时的SP-DS转换方法。基于声源位置的先验信息，优化了表示扬声器驱动信号的最优基函数及其系数。数值模拟结果表明，与现有的SP-DS转换方法相比，该方法具有更高的再现精度，特别是在空间奈奎斯特频率以上的更高频率下。

{"title":"Map estimation of driving signals of loudspeakers for sound field reproduction from pressure measurements","authors":"Shoichi Koyama, K. Furuya, Y. Hiwasaki, Y. Haneda","doi":"10.1109/WASPAA.2013.6701895","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701895","url":null,"abstract":"Sound field reproduction methods calculate driving signals of loudspeakers to reproduce the desired sound field. In common recording and reproduction systems, sound pressures at multiple positions obtained in a recording room are only known as the desired sound field; therefore, signal transformation algorithms from sound pressures into driving signals (SP-DS conversion) are necessary. Although several SP-DS conversion methods have been proposed, they do not take into account a priori information about the recorded sound field. However, approximate positions of sound sources can be obtained by using the received signals of microphones or other sensor data. We propose an SP-DS conversion method based on the maximum a posteriori (MAP) estimation when array configurations of the microphones and loudspeakers are planar or linear. The optimal basis functions and their coefficients for representing driving signals of the loudspeakers are optimized based on the prior information of the source positions. Numerical simulation results indicate that the proposed method can achieve higher reproduction accuracy compared to the current SP-DS conversion methods, especially in higher frequencies above the spatial Nyquist frequency.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"85 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131364759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Perceptually motivated ANC for hearing-impaired listeners 听觉受损听众的感知动机ANC

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701834

E. Durant, Jinjun Xiao, Buye Xu, M. McKinney, Zhang Tao

The goal of noise control in hearing aids is to improve listening perception. In this paper we propose modifying a perceptually motivated active noise control (ANC) algorithm by incorporating a perceptual model into the cost function, resulting in a dynamic residual noise spectrum shaping technique based on the time-varying residual noise. The perceptual criterion to be minimized could be sharpness, discordance, annoyance, etc. As an illustrative example, we use loudness perceived by a hearing-impaired listener as the cost function. Specifically, we design the spectrum shaping filter using the listener's hearing loss and the dynamic residual noise spectrum. Simulations show significant improvements of 3-4 sones over energy reduction (ER) for severe high-frequency losses for some common noises that would be 6-12 without processing. However, average loudness across a wide range of noises is only slightly better than with ER, with greater improvements realized with increasing hearing loss. We analyze one way in which the algorithm fails and trace it to over-reliance on the common psychoacoustic modelling simplification that auditory channels are independent to a first approximation. This suggests future work that may improve performance.

助听器噪声控制的目的是改善听感。本文提出了一种基于感知动机的主动噪声控制(ANC)算法，通过在成本函数中加入感知模型，得到了一种基于时变残差噪声的动态残差噪声频谱整形技术。要最小化的感知标准可能是尖锐、不协调、烦恼等。作为一个说明性的例子，我们使用听障听众感知到的响度作为成本函数。具体来说，我们利用听者的听力损失和动态残余噪声频谱来设计频谱整形滤波器。模拟表明，对于一些常见噪声，如果不进行处理，其严重高频损失将达到6-12倍，相比能量降低(ER)，其显著改善幅度为3-4倍。然而，在广泛的噪音范围内，平均响度仅略好于ER，随着听力损失的增加而实现更大的改善。我们分析了算法失败的一种方式，并将其追溯到过度依赖于常见的心理声学建模简化，即听觉通道独立于第一近似。这表明未来的工作可能会提高他们的表现。

{"title":"Perceptually motivated ANC for hearing-impaired listeners","authors":"E. Durant, Jinjun Xiao, Buye Xu, M. McKinney, Zhang Tao","doi":"10.1109/WASPAA.2013.6701834","DOIUrl":"https://doi.org/10.1109/WASPAA.2013.6701834","url":null,"abstract":"The goal of noise control in hearing aids is to improve listening perception. In this paper we propose modifying a perceptually motivated active noise control (ANC) algorithm by incorporating a perceptual model into the cost function, resulting in a dynamic residual noise spectrum shaping technique based on the time-varying residual noise. The perceptual criterion to be minimized could be sharpness, discordance, annoyance, etc. As an illustrative example, we use loudness perceived by a hearing-impaired listener as the cost function. Specifically, we design the spectrum shaping filter using the listener's hearing loss and the dynamic residual noise spectrum. Simulations show significant improvements of 3-4 sones over energy reduction (ER) for severe high-frequency losses for some common noises that would be 6-12 without processing. However, average loudness across a wide range of noises is only slightly better than with ER, with greater improvements realized with increasing hearing loss. We analyze one way in which the algorithm fails and trace it to over-reliance on the common psychoacoustic modelling simplification that auditory channels are independent to a first approximation. This suggests future work that may improve performance.","PeriodicalId":341888,"journal":{"name":"2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128149006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Sound event detection using non-negative dictionaries learned from annotated overlapping events 声音事件检测使用非负字典学习注解重叠事件

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701861

O. Dikmen, A. Mesaros

Detection of overlapping sound events generally requires training class models either from separate data for each class or by making assumptions about the dominating events in the mixed signals. Methods based on sound source separation are currently used in this task, but involve the problem of assigning separated components to sources. In this paper, we propose a method which bypasses the need to build separate sound models. Instead, non-negative dictionaries for the sound content and their annotations are learned in a coupled sense. In the testing stage, time activations of the sound dictionary columns are estimated and used to reconstruct annotations using the annotation dictionary. The method requires no separate training data for classes and in general very promising results are obtained using only a small amount of data.

检测重叠的声音事件通常需要训练类模型，或者从每个类的单独数据中，或者通过对混合信号中的主要事件进行假设。基于声源分离的方法目前用于这项任务，但涉及到将分离的组件分配给声源的问题。在本文中，我们提出了一种绕过需要建立单独的声音模型的方法。相反，声音内容及其注释的非负字典是以耦合的方式学习的。在测试阶段，估计声音字典列的时间激活，并使用注释字典重建注释。该方法不需要单独的类训练数据，通常仅使用少量数据就可以获得非常有希望的结果。

引用次数: 53

Relative transfer function modeling for supervised source localization 有监督源定位的相对传递函数建模

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701829

Bracha Laufer-Goldshtein, R. Talmon, S. Gannot

Speaker localization is one of the most prevalent problems in speech processing. Despite significant efforts in the last decades, high reverberation level still limits the performance of localization algorithms. Furthermore, using conventional localization methods, the information that can be extracted from dual microphone measurements is restricted to the time difference of arrival (TDOA). Under far-field regime, this is equivalent to either azimuth or elevation angles estimation. Full description of speaker's coordinates necessitates several microphones. In this contribution we tackle these two limitations by taking a manifold learning perspective for system identification. We present a training-based algorithm, motivated by the concept of diffusion maps, that aims at recovering the fundamental controlling parameters driving the measurements. This approach turns out to be more robust to reverberation, and capable of recovering the speech source location using merely two microphones signals.

说话人定位是语音处理中最常见的问题之一。尽管在过去的几十年里做出了巨大的努力，但高混响水平仍然限制了定位算法的性能。此外，使用传统的定位方法，从双传声器测量中提取的信息仅限于到达时间差(TDOA)。在远场情况下，这相当于方位角或仰角估计。要充分描述说话人的坐标，就需要几个麦克风。在本文中，我们通过采用系统识别的多种学习视角来解决这两个限制。我们提出了一种基于训练的算法，由扩散图的概念驱动，旨在恢复驱动测量的基本控制参数。事实证明，这种方法对混响的鲁棒性更强，并且仅使用两个麦克风信号就能恢复语音源位置。

引用次数: 41

Under-determined source separation based on power spectral density estimated using cylindrical mode beamforming 基于柱模波束形成估计的功率谱密度的欠定源分离

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701836

Yusuke Hioka, T. Betlehem

Sound source signals can be separated using Wiener post-filters calculated by estimating the power spectral densities (PSDs) of sources from the outputs of a set of beamformers. This approach has been shown effective in the under-determined case where the number of sources to be separated exceeds the number of microphones. In this paper, a limit on the maximum number of separable sources is derived beyond which the problem becomes rank deficient. This study reveals the number of sources that can be separated simultaneously is related to the order of the beam patterns. Further, using the principles of cylindrical mode beamforming, the performance can be predicted as a function of frequency. The result is consistent with simulations in which the performance of separating music and speech sound sources was quantified.

通过从一组波束形成器的输出中估计声源的功率谱密度(psd)，利用维纳后滤波器可以分离声源信号。这种方法已被证明是有效的，在不确定的情况下，要分离的源的数量超过麦克风的数量。本文给出了可分离源的最大数目的一个极限，超过这个极限问题就成为秩亏问题。该研究表明，可以同时分离的源的数量与光束模式的顺序有关。此外，利用圆柱模式波束形成原理，可以预测其性能作为频率的函数。结果与对音乐声源和语音声源分离性能进行量化的仿真结果一致。

引用次数: 6

About this non-negative business 关于这个非负的业务

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701898

P. Smaragdis

Summary form only given: The foundations of signal processing are firmly set in least squares, an approach that has served us very well for years (and still does). With the increasing presence of machine learning and sophisticated statistics in audio processing, we are slowly seeing that not everything has to be based on Gaussians anymore. One recently popular approach along these lines is that of non-negative modeling, especially in problems that involve complex audio mixtures. In this keynote I'll talk about how these models came to be, what they can do, why they have been so successful, and I'll ponder on what the future holds as new developments are continuously coming in.

仅提供摘要形式:信号处理的基础牢固地建立在最小二乘上，这种方法多年来一直很好地服务于我们(现在仍然如此)。随着机器学习和复杂的统计数据在音频处理中的出现，我们慢慢地看到，不是所有的东西都必须基于高斯函数。最近流行的一种方法是非负建模，特别是在涉及复杂音频混合的问题中。在这次主题演讲中，我将讨论这些模式是如何形成的，它们能做什么，为什么它们如此成功，我还将思考随着新发展的不断出现，未来会是什么样子。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀