2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics最新文献

英文中文

A polynomial interpolation-based scheme for reducing bandwidth in distributed speech recognition system 一种基于多项式插值的分布式语音识别系统带宽降低方案

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701880

A. Touazi, M. Debyeche

In this paper, we propose a low bit-rate compression scheme in distributed speech recognition (DSR) system based on polynomial interpolation. Dimensionality reduction of a set of successive Mel frequency cepstral coefficients (MFCCs) is achieved by performing polynomial least squares fitting. A conventional vector quantization (VQ) is applied to the polynomial coefficients to achieve more than 58% of bandwidth reduction as compared to ETSI advanced front-end (ETSI-AFE) encoder. Evaluation performance has been conducted on the Aurora-2 database in clean and multi-condition training modes. With respect to ETSI-AFE, the results obtained with the proposed encoder show no significant degradation in term of overall recognition accuracy.

本文提出了一种基于多项式插值的分布式语音识别(DSR)系统的低比特率压缩方案。通过多项式最小二乘拟合实现了一组连续Mel频率倒谱系数(MFCCs)的降维。传统的矢量量化(VQ)应用于多项式系数，与ETSI高级前端(ETSI- afe)编码器相比，带宽减少超过58%。对Aurora-2数据库进行了干净和多条件训练模式下的性能评估。对于ETSI-AFE，使用所提出的编码器获得的结果在整体识别精度方面没有显着下降。

引用次数: 1

Hierarchical and coupled non-negative dynamical systems with application to audio modeling 层次耦合非负动力系统在音频建模中的应用

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701891

Umut Simsekli, Jonathan Le Roux, J. Hershey

Many kinds of non-negative data, such as power spectra and count data, have been modeled using non-negative matrix factorization. Even though this modeling paradigm has yielded successful applications, it falls short when the data have certain hierarchical and temporal structure. In this study, we propose a novel dynamical system model that can handle these kinds of complex structures that often arise in non-negative data. We show that our model can be extended to handle heterogeneous data for data-driven regularization. We present convergence-guaranteed update rules for each latent factor. In order to assess the performance, we evaluate our model on the transcription of classical piano pieces, and show that it outperforms related models. We also illustrate that the performance can be further improved by making use of symbolic data.

许多非负数据，如功率谱和计数数据，已经使用非负矩阵分解建模。尽管这种建模范式已经产生了成功的应用程序，但当数据具有一定的层次结构和时间结构时，它就会有所不足。在这项研究中，我们提出了一种新的动力系统模型，可以处理这些经常出现在非负数据中的复杂结构。我们表明，我们的模型可以扩展到处理异构数据的数据驱动正则化。针对每个潜在因素给出了收敛保证的更新规则。为了评估性能，我们在古典钢琴曲的转录上评估了我们的模型，并表明它优于相关模型。我们还说明了使用符号数据可以进一步提高性能。

引用次数: 7

A probabilistic line spectrum model for musical instrument sounds and its application to piano tuning estimation 乐器声音的概率线谱模型及其在钢琴调音估计中的应用

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701879

François Rigaud, Angélique Dremeau, B. David, L. Daudet

The paper introduces a probabilistic model for the analysis of line spectra - defined here as a set of frequencies of spectral peaks with significant energy. This model is detailed in a general polyphonic audio framework and assumes that, for a time-frame of signal, the observations have been generated by a mixture of notes composed by partial and noise components. Observations corresponding to partial frequencies can provide some information on the musical instrument that generated them. In the case of piano music, the fundamental frequency and the inharmonicity coefficient are introduced as parameters for each note, and can be estimated from the line spectra parameters by means of an Expectation-Maximization algorithm. This technique is finally applied for the unsupervised estimation of the tuning and inharmonicity along the whole compass of a piano, from the recording of a musical piece.

本文介绍了一种用于谱线分析的概率模型，这里将其定义为具有显著能量的谱峰频率的集合。该模型在一般复调音频框架中进行了详细说明，并假设，对于信号的时间框架，观察结果是由部分和噪声成分组成的混合音符产生的。与部分频率相对应的观测可以提供产生它们的乐器的一些信息。以钢琴音乐为例，引入基频和非谐波系数作为每个音符的参数，并通过期望最大化算法从谱参数中估计。最后，将该技术应用于从音乐作品的录音中，沿钢琴的整个罗盘进行无监督的调音和不谐音估计。

引用次数: 1

An LCMV filter for single-channel noise cancellation and reduction in the time domain LCMV滤波器用于时域单通道噪声消除和降低

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701870

J. Jensen, J. Benesty, M. G. Christensen, Jingdong Chen

In this paper, we consider a recent class of optimal rectangular filtering matrices for single-channel speech enhancement. This class of filters exploits the fact that the dimension of the signal subspace is lower than that of the full space. Then, extra degrees of freedom in the filters, that are otherwise reserved for preserving the signal subspace, can be used for achieving an improved output signal-to-noise ratio (SNR). Interestingly, these filters unify the ideas of optimal filtering and subspace methods. We propose an optimal LCMV filter in this framework with minimum output power that passes the desired signal undistorted and cancels correlated noise. The cancellation was not facilitated by the filters derived so far in this framework. The results show that the proposed filter can achieve output SNRs similar to that of competing filter designs, while having a much higher output signal-to-interference ratio. This is showed for both synthetic and real speech signals.

本文考虑了一类用于单通道语音增强的最优矩形滤波矩阵。这类滤波器利用了信号子空间的维数低于整个空间的维数这一事实。然后，在滤波器中额外的自由度，否则保留保留的信号子空间，可用于实现提高输出信噪比(SNR)。有趣的是，这些滤波器统一了最优滤波和子空间方法的思想。我们在这个框架中提出了一个最优的LCMV滤波器，它具有最小的输出功率，可以使期望的信号不失真并消除相关噪声。到目前为止，在这个框架中导出的过滤器并不能促进这种取消。结果表明，该滤波器的输出信噪比与竞争滤波器相似，同时具有更高的输出信干扰比。这在合成和真实语音信号中都得到了证明。

引用次数: 1

Comparison of windowing in speech and audio coding 语音和音频编码中窗口化的比较

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701853

Tomas Bäckström

Speech and audio coding have during the last decade converged to an increasingly unified technology. This contribution discusses one of the remaining fundamental differences between speech and audio paradigms, namely, windowing of the input signal. Audio codecs generally use lapped transforms and apply a perceptual model in the transform domain, whereby temporal continuity is achieved by windowing and overlap-add. Speech codecs on the other hand achieve temporal continuity by using linear predictive filtering, whereby windowing is applied in the residual domain. Despite these fundamental differences, we demonstrate that the two windowing approaches, combined with perceptual modeling, perform very similarly both in terms of perceptual quality and theoretical properties.

在过去的十年中，语音和音频编码已经融合为一种越来越统一的技术。这一贡献讨论了语音和音频范式之间剩余的基本差异之一，即输入信号的窗口。音频编解码器通常使用重叠变换，并在变换域应用感知模型，其中通过窗口和重叠添加实现时间连续性。另一方面，语音编解码器通过使用线性预测滤波实现时间连续性，其中在残差域中应用窗口。尽管存在这些根本差异，但我们证明，两种窗口方法与感知建模相结合，在感知质量和理论特性方面都表现得非常相似。

引用次数: 11

Spectral feature-based nonlinear residual echo suppression 基于频谱特征的非线性残馀回波抑制

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701825

A. Schwarz, Christian Hofmann, Walter Kellermann

We propose a method for nonlinear residual echo suppression that consists of extracting spectral features from the far-end signal, and using an artificial neural network to model the residual echo magnitude spectrum from these features. We compare the modeling accuracy achieved by realizations with different features and network topologies, evaluating the mean squared error of the estimated residual echo magnitude spectrum. We also present a low complexity real-time implementation combining an offline-trained network with online adaptation, and investigate its performance in terms of echo suppression and speech distortion for real mobile phone recordings.

本文提出了一种非线性残余回波抑制方法，该方法包括从远端信号中提取频谱特征，并使用人工神经网络根据这些特征对残余回波幅度谱进行建模。我们比较了不同特征和网络拓扑实现的建模精度，评估了估计的残差回波幅度谱的均方误差。我们还提出了一种结合离线训练网络和在线适应的低复杂性实时实现，并研究了其在真实手机录音的回波抑制和语音失真方面的性能。

引用次数: 35

Loudspeaker placement for sound field reproduction by constrained matching pursuit 用约束匹配追踪法再现声场的扬声器布置

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701838

H. Khalilian, I. Bajić, R. Vaughan

We describe a method for approximating a desired sound filed in a cubic region using a planar array of omnidirectional loudspeakers. For this purpose, a constrained matching pursuit algorithm is employed to find the appropriate locations of the loudspeakers. Unlike previously proposed methods for sound field approximation, this iterative procedure attempts to approximate the residual error vector at each iteration, leading to a more efficient representation of the desired field as a linear combination of the Acoustic Transfer Functions (ATFs) of the selected loudspeakers. Simulations suggest that the new method offers considerable improvement in approximation accuracy compared to uniformly placed loudspeakers, as well as another recent method for loudspeaker placement.

我们描述了一种使用全向扬声器的平面阵列在立方区域近似所需声场的方法。为此，采用约束匹配追踪算法寻找扬声器的合适位置。与先前提出的声场近似方法不同，该迭代过程试图在每次迭代中近似残差向量，从而更有效地将所需的场表示为所选扬声器的声学传递函数(atf)的线性组合。模拟表明，与均匀放置扬声器以及最近的另一种扬声器放置方法相比，新方法在近似精度方面有了相当大的提高。

引用次数: 11

Microphone multiplexing with diffuse noise model-based principal component analysis 基于扩散噪声模型的传声器复用主成分分析

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701877

Sonia Badar, Nobutaka Ono, L. Daudet

Reducing the total data throughput for microphones arrays is often necessary, especially when using very large arrays. However, what information can be lost depends on the processing task at the decoder level. In this paper, we investigate simple ways of linearly down-mixing the microphone signals into a reduced number of channels, using non-adaptive coefficients derived from a diffuse noise model, based only on the geometry of the array. In source separation experiments, this multiplexing scheme provides no significant loss in quality even with a high reduction in the number of transmission channels, and outperforms a multiplexing scheme with random coefficients. It furthermore introduces some robustness with respect to the microphone gains and angle from the sources.

减少麦克风阵列的总数据吞吐量通常是必要的，特别是在使用非常大的阵列时。然而，哪些信息可能丢失取决于解码器级别的处理任务。在本文中，我们研究了将麦克风信号线性下混到减少通道数量的简单方法，使用从漫射噪声模型中导出的非自适应系数，仅基于阵列的几何形状。在信源分离实验中，该复用方案即使在传输信道数量大幅减少的情况下也没有明显的质量损失，并且优于随机系数复用方案。此外，它还引入了一些关于麦克风增益和源角度的鲁棒性。

引用次数: 1

On the misalignment of stereophonic acoustic echo cancellation with decorrelation by resampling 重采样去相关立体声回波消除的不对中研究

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701811

Jason Wung, Ted S. Wada, M. Souden, B. Juang

It is well established that a decorrelation procedure is required in a multi-channel acoustic echo control system to mitigate the so-called non-uniqueness problem. A recently proposed technique that accomplishes decorrelation by resampling (DBR) has been shown to be advantageous; it achieves a superior performance in the echo reduction gain and offers the possibility of frequency selective decorrelation to further preserve the sound quality of the system. In this paper, we analyze with rigor the performance behavior of DBR in terms of coherence reduction and the resultant misalignment of an adaptive filter. We derive closed-form expressions for the performance bounds and validate the theoretical analysis with simulation.

在多通道声回波控制系统中，需要一个去相关程序来缓解所谓的非唯一性问题。最近提出的一种通过重采样(DBR)来实现去相关的技术已被证明是有利的;它在回波抑制增益方面取得了优异的性能，并提供了频率选择性去相关的可能性，以进一步保持系统的音质。在本文中，我们严格地分析了DBR的性能行为，从相干降低和由此产生的自适应滤波器的失调。导出了性能边界的封闭表达式，并用仿真验证了理论分析。

引用次数: 1

Adaptive distance and near-field compensation applied to microphones 应用于麦克风的自适应距离和近场补偿

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

Pub Date : 2013-10-01 DOI: 10.1109/WASPAA.2013.6701856

W. Etter

In voice acquisition, variations of the microphone distance introduce not only level changes, but also frequency response changes due to the near-field effect. This paper presents a method for adaptive distance and near-field compensation based on the talker-to-microphone distance and the microphone polar pattern. If available, the microphone orientation and the critical distance associated with the room acoustic can be taken into account to further improve compensation accuracy. Aimed at teleconference use, the significance of the critical distance for compensation is discussed for office and conference rooms. An example for the performance of the algorithm is provided, in which a sensor is applied to continuously measure a varying microphone distance.

在语音采集中，麦克风距离的变化不仅会引起电平的变化，还会引起近场效应引起的频率响应的变化。本文提出了一种基于话音到传声器距离和传声器极向图的自适应距离和近场补偿方法。如果可以，可以考虑麦克风的方向和与房间声学相关的临界距离，以进一步提高补偿精度。针对远程会议的使用，讨论了办公室和会议室补偿临界距离的意义。给出了该算法的性能示例，其中传感器用于连续测量变化的麦克风距离。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀