Relative transfer function modeling for supervised source localization

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics Pub Date : 2013-10-01 DOI:10.1109/WASPAA.2013.6701829

Bracha Laufer-Goldshtein, R. Talmon, S. Gannot

引用次数: 41

Abstract

Speaker localization is one of the most prevalent problems in speech processing. Despite significant efforts in the last decades, high reverberation level still limits the performance of localization algorithms. Furthermore, using conventional localization methods, the information that can be extracted from dual microphone measurements is restricted to the time difference of arrival (TDOA). Under far-field regime, this is equivalent to either azimuth or elevation angles estimation. Full description of speaker's coordinates necessitates several microphones. In this contribution we tackle these two limitations by taking a manifold learning perspective for system identification. We present a training-based algorithm, motivated by the concept of diffusion maps, that aims at recovering the fundamental controlling parameters driving the measurements. This approach turns out to be more robust to reverberation, and capable of recovering the speech source location using merely two microphones signals.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

有监督源定位的相对传递函数建模

说话人定位是语音处理中最常见的问题之一。尽管在过去的几十年里做出了巨大的努力，但高混响水平仍然限制了定位算法的性能。此外，使用传统的定位方法，从双传声器测量中提取的信息仅限于到达时间差(TDOA)。在远场情况下，这相当于方位角或仰角估计。要充分描述说话人的坐标，就需要几个麦克风。在本文中，我们通过采用系统识别的多种学习视角来解决这两个限制。我们提出了一种基于训练的算法，由扩散图的概念驱动，旨在恢复驱动测量的基本控制参数。事实证明，这种方法对混响的鲁棒性更强，并且仅使用两个麦克风信号就能恢复语音源位置。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

自引率

0.00%

发文量

期刊最新文献

Using articulation index band correlations to objectively estimate speech intelligibility consistent with the modified rhyme test Roomprints for forensic audio applications The geometry of sound-source localization using non-coplanar microphone arrays Sparse representation and epoch estimation of voiced speech Spotforming using distributed microphone arrays