首页 > 最新文献

IEEE Trans. Speech Audio Process.最新文献

英文 中文
ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation ANIQUE:单端语音质量估计的听觉模型
Pub Date : 2005-08-15 DOI: 10.1109/TSA.2005.851924
Doh-Suk Kim
In predicting subjective quality of speech signal degraded by telecommunication networks, conventional objective models require a reference source speech signal, which is applied as an input to the network, as well as the degraded speech. Non-intrusive estimation of speech quality is a challenging problem in that only the degraded speech signal is available. Non-intrusive estimation can be used in many real applications when source speech signal is not available. In this paper, we propose a new approach for non-intrusive speech quality estimation utilizing the temporal envelope representation of speech. The proposed auditory non-intrusive quality estimation (ANIQUE) model is based on the functional roles of human auditory systems and the characteristics of human articulation systems. Experimental evaluations on 35 different tests demonstrated the effectiveness of the proposed model.
在预测被电信网络退化的语音信号的主观质量时,传统的客观模型需要一个参考源语音信号,作为网络的输入,以及退化的语音。语音质量的非侵入估计是一个具有挑战性的问题,因为只有退化的语音信号可用。非侵入估计可用于许多实际应用中,当源语音信号不可用。在本文中,我们提出了一种利用语音的时间包络表示进行非侵入式语音质量估计的新方法。基于人类听觉系统的功能作用和人类发音系统的特点,提出了听觉非侵入性质量估计(ANIQUE)模型。35个不同测试的实验评估证明了所提出模型的有效性。
{"title":"ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation","authors":"Doh-Suk Kim","doi":"10.1109/TSA.2005.851924","DOIUrl":"https://doi.org/10.1109/TSA.2005.851924","url":null,"abstract":"In predicting subjective quality of speech signal degraded by telecommunication networks, conventional objective models require a reference source speech signal, which is applied as an input to the network, as well as the degraded speech. Non-intrusive estimation of speech quality is a challenging problem in that only the degraded speech signal is available. Non-intrusive estimation can be used in many real applications when source speech signal is not available. In this paper, we propose a new approach for non-intrusive speech quality estimation utilizing the temporal envelope representation of speech. The proposed auditory non-intrusive quality estimation (ANIQUE) model is based on the functional roles of human auditory systems and the characteristics of human articulation systems. Experimental evaluations on 35 different tests demonstrated the effectiveness of the proposed model.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"153 1","pages":"821-831"},"PeriodicalIF":0.0,"publicationDate":"2005-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86039932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Combination of autocorrelation-based features and projection measure technique for speaker identification 基于自相关特征与投影测量相结合的说话人识别技术
Pub Date : 2005-06-20 DOI: 10.1109/TSA.2005.848893
Kuo-Hwei Yuo, Tai-Hwei Hwang, Hsiao-Chuan Wang
This paper presents a robust approach for speaker identification when the speech signal is corrupted by additive noise and channel distortion. Robust features are derived by assuming that the corrupting noise is stationary and the channel effect is fixed during an utterance. A two-step temporal filtering procedure on the autocorrelation sequence is proposed to minimize the effect of additive and convolutional noises. The first step applies a temporal filtering procedure in autocorrelation domain to remove the additive noise, and the second step is to perform the mean subtraction on the filtered autocorrelation sequence in logarithmic spectrum domain to remove the channel effect. No prior knowledge of noise characteristic is necessary. The additive noise can be a colored noise. Then the proposed robust feature is combined with the projection measure technique to gain further improvement in recognition accuracy. Experimental results show that the proposed method can significantly improve the performance of speaker identification task in noisy environment.
本文提出了一种鲁棒的语音信号被加性噪声和信道失真破坏时的说话人识别方法。鲁棒性是通过假设干扰噪声是平稳的,信道效应是固定的而得到的。为了减小加性噪声和卷积噪声对自相关序列的影响,提出了一种两步时域滤波方法。第一步在自相关域进行时域滤波去除加性噪声,第二步在对数谱域对滤波后的自相关序列进行均值减法去除信道效应。不需要先验的噪声特性知识。加性噪声可以是有色噪声。然后将所提出的鲁棒特征与投影测量技术相结合,进一步提高了识别精度。实验结果表明,该方法能显著提高噪声环境下说话人识别任务的性能。
{"title":"Combination of autocorrelation-based features and projection measure technique for speaker identification","authors":"Kuo-Hwei Yuo, Tai-Hwei Hwang, Hsiao-Chuan Wang","doi":"10.1109/TSA.2005.848893","DOIUrl":"https://doi.org/10.1109/TSA.2005.848893","url":null,"abstract":"This paper presents a robust approach for speaker identification when the speech signal is corrupted by additive noise and channel distortion. Robust features are derived by assuming that the corrupting noise is stationary and the channel effect is fixed during an utterance. A two-step temporal filtering procedure on the autocorrelation sequence is proposed to minimize the effect of additive and convolutional noises. The first step applies a temporal filtering procedure in autocorrelation domain to remove the additive noise, and the second step is to perform the mean subtraction on the filtered autocorrelation sequence in logarithmic spectrum domain to remove the channel effect. No prior knowledge of noise characteristic is necessary. The additive noise can be a colored noise. Then the proposed robust feature is combined with the projection measure technique to gain further improvement in recognition accuracy. Experimental results show that the proposed method can significantly improve the performance of speaker identification task in noisy environment.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"34 1","pages":"565-574"},"PeriodicalIF":0.0,"publicationDate":"2005-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91163988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Rapid online adaptation based on transformation space model evolution 基于变换空间模型演化的快速在线自适应
Pub Date : 2005-02-22 DOI: 10.1109/TSA.2004.841427
Dong Kook Kim, N. Kim
This paper presents a new approach to online linear regression adaptation of continuous density hidden Markov models based on transformation space model (TSM) evolution. The TSM which characterizes the a priori knowledge of the training speakers associated with maximum likelihood linear regression matrix parameters is effectively described in terms of the latent variable models such as the factor analysis or probabilistic principal component analysis. The TSM provides various sources of information such as the correlation information, the prior distribution, and the prior knowledge of the regression parameters that are very useful for rapid adaptation. The quasi-Bayes estimation algorithm is formulated to incrementally update the hyperparameters of the TSM and regression matrices simultaneously. The proposed TSM evolution is a general framework with batch TSM adaptation as a special case. Experiments on supervised speaker adaptation demonstrate that the proposed approach is more effective compared with the conventional quasi-Bayes linear regression technique when a small amount of adaptation data is available.
提出了一种基于变换空间模型(TSM)演化的连续密度隐马尔可夫模型在线线性回归自适应方法。TSM通过因子分析或概率主成分分析等潜在变量模型有效地描述了训练讲者与最大似然线性回归矩阵参数相关的先验知识。TSM提供各种信息源,例如相关性信息、先验分布和回归参数的先验知识,这些对快速适应非常有用。拟贝叶斯估计算法用于同时增量更新TSM和回归矩阵的超参数。本文提出的TSM演进是一个通用框架,而批量TSM适应是一个特例。有监督说话人自适应实验表明,当自适应数据较少时,该方法比传统的准贝叶斯线性回归方法更有效。
{"title":"Rapid online adaptation based on transformation space model evolution","authors":"Dong Kook Kim, N. Kim","doi":"10.1109/TSA.2004.841427","DOIUrl":"https://doi.org/10.1109/TSA.2004.841427","url":null,"abstract":"This paper presents a new approach to online linear regression adaptation of continuous density hidden Markov models based on transformation space model (TSM) evolution. The TSM which characterizes the a priori knowledge of the training speakers associated with maximum likelihood linear regression matrix parameters is effectively described in terms of the latent variable models such as the factor analysis or probabilistic principal component analysis. The TSM provides various sources of information such as the correlation information, the prior distribution, and the prior knowledge of the regression parameters that are very useful for rapid adaptation. The quasi-Bayes estimation algorithm is formulated to incrementally update the hyperparameters of the TSM and regression matrices simultaneously. The proposed TSM evolution is a general framework with batch TSM adaptation as a special case. Experiments on supervised speaker adaptation demonstrate that the proposed approach is more effective compared with the conventional quasi-Bayes linear regression technique when a small amount of adaptation data is available.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"67 1","pages":"194-202"},"PeriodicalIF":0.0,"publicationDate":"2005-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83437925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Crosstalk resilient interference cancellation in microphone arrays using Capon beamforming
Pub Date : 2004-08-16 DOI: 10.1109/TSA.2004.833011
Wing-Kin Ma, P. Ching, B. Vo
This paper studies a reference-assisted approach for interference canceling (IC) in microphone array systems. Conventionally, reference-assisted IC is based on the zero crosstalk assumption; i.e., when the desired source signal is absent in the reference microphones. In applications where crosstalk is inevitable, the conventional IC approach usually exhibits degraded performance due to cancellation of the desired signal. In this paper, we develop a crosstalk resilient IC method based on the Capon beamforming technique. The proposed beamformer deals with the uncertainty of crosstalk by applying a constraint on the worst-case crosstalk magnitude. The proposed beamformer not only performs IC, it also provides blind beamforming of the desired signal. We show that a blind beamformer based on the traditional minimum-mean-square-error (MMSE) IC method is a special case of the proposed beamformer. One key step of implementing the proposed Capon beamformer lies in solving a difficult nonconvex optimization problem, and we illustrate how the Capon optimal solution can be effectively approximated using the so-called semidefinite relaxation algorithm. Simulation results demonstrate that the proposed beamformer is more robust against crosstalk-induced signal cancellation than beamformers based on the MMSE-IC methods.
研究了一种参考辅助的麦克风阵列干扰消除方法。传统上,参考辅助集成电路基于零串扰假设;即,当所需的源信号在参考麦克风中不存在时。在串扰不可避免的应用中,传统的集成电路方法通常由于期望信号的抵消而表现出性能下降。本文提出了一种基于Capon波束形成技术的串扰弹性集成电路方法。该波束形成器通过对最坏情况串扰幅度施加约束来处理串扰的不确定性。所提出的波束形成器不仅可以实现集成电路,还可以提供所需信号的盲波束形成。我们证明了基于传统最小均方误差(MMSE)集成电路方法的盲波束形成器是该波束形成器的特例。实现所提出的Capon波束形成器的一个关键步骤在于解决一个困难的非凸优化问题,我们说明了如何使用所谓的半定松弛算法有效地逼近Capon最优解。仿真结果表明,与基于MMSE-IC方法的波束形成器相比,该波束形成器对串扰引起的信号抵消具有更强的鲁棒性。
{"title":"Crosstalk resilient interference cancellation in microphone arrays using Capon beamforming","authors":"Wing-Kin Ma, P. Ching, B. Vo","doi":"10.1109/TSA.2004.833011","DOIUrl":"https://doi.org/10.1109/TSA.2004.833011","url":null,"abstract":"This paper studies a reference-assisted approach for interference canceling (IC) in microphone array systems. Conventionally, reference-assisted IC is based on the zero crosstalk assumption; i.e., when the desired source signal is absent in the reference microphones. In applications where crosstalk is inevitable, the conventional IC approach usually exhibits degraded performance due to cancellation of the desired signal. In this paper, we develop a crosstalk resilient IC method based on the Capon beamforming technique. The proposed beamformer deals with the uncertainty of crosstalk by applying a constraint on the worst-case crosstalk magnitude. The proposed beamformer not only performs IC, it also provides blind beamforming of the desired signal. We show that a blind beamformer based on the traditional minimum-mean-square-error (MMSE) IC method is a special case of the proposed beamformer. One key step of implementing the proposed Capon beamformer lies in solving a difficult nonconvex optimization problem, and we illustrate how the Capon optimal solution can be effectively approximated using the so-called semidefinite relaxation algorithm. Simulation results demonstrate that the proposed beamformer is more robust against crosstalk-induced signal cancellation than beamformers based on the MMSE-IC methods.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"2 1","pages":"468-477"},"PeriodicalIF":0.0,"publicationDate":"2004-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76642765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Introduction to the Special Issue on Multichannel Signal Processing for Audio and Acoustics Applications 音频和声学应用中的多通道信号处理特刊导论
Pub Date : 2004-08-16 DOI: 10.1109/TSA.2004.833716
Walter Kellermann, M. Sondhi, D. DeVries
HE IEEE Signal Processing Society has its roots in an area where acoustics, speech, and signal processing converge, as was reflected in the former name of the society when it was founded in 1974. The interface between acoustics, speech, and signal processing is still an area of great interest to the society, with many fundamental problems still unsolved. Research is driven by applications where acoustic signals have to be captured, transmitted, and/or reproduced in an acoustic environment that includes echoes, noise, and reverberation Considering human/machine interfaces as a major area of applications, it is obvious that signal processing becomes more challenging as the distance between humans and the machines increases, as the signal bandwidth increases, and as the acoustic environment becomes more complex and hostile. Increasingly sophisticated algorithms have been developed since the mid-1970s and along with the availability of greatly increased and affordable computational power, multichannel signal processing algorithms naturally evolved for exploiting the spatial dimension of acoustic signals. The importance and popularity of this field was well reflected by the large number of submissions to this special issue. The volume of high-quality papers could not be fitted into the page budget allotted to us. Thus, we regrettably had to decide to publish some of them in a second section of this special issue as part of a regular issue of the TRANSACTIONS in early 2005. For sound reproduction, where we want to provide a pair of desired signals at the listeners’ ear drums, seamless human/machine interfaces based on multichannel techniques have been implemented since the invention of stereo systems. However, providing the true spatial sound experience in large listening spaces became possible only with new multichannel signal processing techniques, such as wavefield synthesis. Still, major challenges remain, especially phase-true equalization of listening room acoustics and the cancellation of local noise sources and interferers. On the other hand, acquisition of audio and speech signals has been a research topic since the invention of the microphone and still today presents major challenges for the signal processing community. Structurally the simplest problem, the acoustic feedback from loudspeakers into microphones is addressed by acoustic echo cancellation: From the single-channel case which has been investigated since the 1970s, research has moved on to stereo and multichannel reproduction, recently culminating in a new wave-domain adaptive filtering concept which has been presented for the first time at ICASSP 2004. For removing unwanted interference and noise from desired signals, multichannel techniques utilize spatial diversity to discriminate between desired and undesired components, either by exploiting different spatial coherence properties or by beamforming, which directs a beam of increased sensitivity towards the desired source. For tr
IEEE信号处理学会起源于声学、语音和信号处理融合的领域,正如1974年成立时该学会的前名称所反映的那样。声学、语音和信号处理之间的接口仍然是一个社会非常感兴趣的领域,许多基本问题仍未解决。研究是由声学信号必须在包括回声、噪声和混响的声学环境中捕获、传输和/或再现的应用驱动的。考虑到人机界面是一个主要的应用领域,很明显,随着人与机器之间距离的增加,信号带宽的增加,以及声学环境变得更加复杂和恶劣,信号处理变得更具挑战性。自20世纪70年代中期以来,越来越复杂的算法得到了发展,随着计算能力的大大提高和可负担得起,多通道信号处理算法自然演变为利用声学信号的空间维度。这一领域的重要性和受欢迎程度在本期特刊收到的大量稿件中得到了很好的反映。高质量论文的数量无法装入分配给我们的页数预算中。因此,遗憾的是,我们不得不决定在2005年初的《汇刊》特刊的第二部分中发表其中的一些文章。对于声音再现,我们希望在听众的耳鼓上提供一对所需的信号,自立体声系统发明以来,基于多声道技术的无缝人机接口已经实现。然而,只有采用新的多通道信号处理技术,如波场合成,才能在大的听音空间中提供真正的空间声音体验。然而,主要的挑战仍然存在,特别是听音室声学的相位真实均衡和局部噪声源和干扰的消除。另一方面,自麦克风发明以来,音频和语音信号的采集一直是一个研究课题,至今仍是信号处理领域面临的主要挑战。结构上最简单的问题,从扬声器到麦克风的声学反馈是通过声学回声消除来解决的:从20世纪70年代以来调查的单通道情况,研究已经转移到立体声和多通道再现,最近在ICASSP 2004上首次提出了一个新的波域自适应滤波概念。为了从期望的信号中去除不必要的干扰和噪声,多通道技术利用空间分集来区分期望的和不希望的分量,要么利用不同的空间相干特性,要么通过波束成形,将灵敏度更高的波束指向期望的源。对于传统的波束形成,如果不能先验地知道源的位置,则需要对源进行定位。反映当前的研究重点在这些领域,科恩的论文解决过滤
{"title":"Introduction to the Special Issue on Multichannel Signal Processing for Audio and Acoustics Applications","authors":"Walter Kellermann, M. Sondhi, D. DeVries","doi":"10.1109/TSA.2004.833716","DOIUrl":"https://doi.org/10.1109/TSA.2004.833716","url":null,"abstract":"HE IEEE Signal Processing Society has its roots in an area where acoustics, speech, and signal processing converge, as was reflected in the former name of the society when it was founded in 1974. The interface between acoustics, speech, and signal processing is still an area of great interest to the society, with many fundamental problems still unsolved. Research is driven by applications where acoustic signals have to be captured, transmitted, and/or reproduced in an acoustic environment that includes echoes, noise, and reverberation Considering human/machine interfaces as a major area of applications, it is obvious that signal processing becomes more challenging as the distance between humans and the machines increases, as the signal bandwidth increases, and as the acoustic environment becomes more complex and hostile. Increasingly sophisticated algorithms have been developed since the mid-1970s and along with the availability of greatly increased and affordable computational power, multichannel signal processing algorithms naturally evolved for exploiting the spatial dimension of acoustic signals. The importance and popularity of this field was well reflected by the large number of submissions to this special issue. The volume of high-quality papers could not be fitted into the page budget allotted to us. Thus, we regrettably had to decide to publish some of them in a second section of this special issue as part of a regular issue of the TRANSACTIONS in early 2005. For sound reproduction, where we want to provide a pair of desired signals at the listeners’ ear drums, seamless human/machine interfaces based on multichannel techniques have been implemented since the invention of stereo systems. However, providing the true spatial sound experience in large listening spaces became possible only with new multichannel signal processing techniques, such as wavefield synthesis. Still, major challenges remain, especially phase-true equalization of listening room acoustics and the cancellation of local noise sources and interferers. On the other hand, acquisition of audio and speech signals has been a research topic since the invention of the microphone and still today presents major challenges for the signal processing community. Structurally the simplest problem, the acoustic feedback from loudspeakers into microphones is addressed by acoustic echo cancellation: From the single-channel case which has been investigated since the 1970s, research has moved on to stereo and multichannel reproduction, recently culminating in a new wave-domain adaptive filtering concept which has been presented for the first time at ICASSP 2004. For removing unwanted interference and noise from desired signals, multichannel techniques utilize spatial diversity to discriminate between desired and undesired components, either by exploiting different spatial coherence properties or by beamforming, which directs a beam of increased sensitivity towards the desired source. For tr","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"11 1","pages":"449-450"},"PeriodicalIF":0.0,"publicationDate":"2004-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87052996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Introduction to the Special Issue on Spontaneous Speech Processing 自发语音处理专题导论
Pub Date : 2004-06-21 DOI: 10.1109/TSA.2004.828628
S. Furui, M. Beckman, Julia Hirschberg, S. Itahashi, Tatsuya Kawahara, Satoshi Nakamura, Shrikanth S. Narayanan
{"title":"Introduction to the Special Issue on Spontaneous Speech Processing","authors":"S. Furui, M. Beckman, Julia Hirschberg, S. Itahashi, Tatsuya Kawahara, Satoshi Nakamura, Shrikanth S. Narayanan","doi":"10.1109/TSA.2004.828628","DOIUrl":"https://doi.org/10.1109/TSA.2004.828628","url":null,"abstract":"","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"5 1","pages":"349-350"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72895375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From the Editor-in-Chief 来自总编辑
Pub Date : 2004-01-01 DOI: 10.1109/TSA.2004.837946
I. Trancoso
{"title":"From the Editor-in-Chief","authors":"I. Trancoso","doi":"10.1109/TSA.2004.837946","DOIUrl":"https://doi.org/10.1109/TSA.2004.837946","url":null,"abstract":"","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"3 1","pages":"553"},"PeriodicalIF":0.0,"publicationDate":"2004-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87631267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source localization in reverberant environments: modeling and statistical analysis 混响环境中的声源定位:建模和统计分析
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818027
T. Gustafsson, B. Rao, M. Trivedi
Room reverberation is typically the main obstacle for designing robust microphone-based source localization systems. The purpose of the paper is to analyze the achievable performance of acoustical source localization methods when room reverberation is present. To facilitate the analysis, we apply well known results from room acoustics to develop a simple but useful statistical model for the room transfer function. The properties of the statistical model are found to correlate well with results from real data measurements. The room transfer function model is further applied to analyze the statistical properties of some existing methods for source localization. In this respect we consider especially the asymptotic error variance and the probability of an anomalous estimate. A noteworthy outcome of the analysis is that the so-called PHAT time-delay estimator is shown to be optimal among a class of cross-correlation based time-delay estimators. To verify our results on the error variance and the outlier probability we apply the image method for simulation of the room transfer function.
室内混响是设计基于麦克风的信号源定位系统的主要障碍。本文的目的是分析存在室内混响时声源定位方法的可实现性能。为了便于分析,我们应用室内声学的众所周知的结果来建立一个简单而有用的房间传递函数统计模型。发现统计模型的性质与实际数据测量的结果有很好的相关性。进一步应用房间传递函数模型,分析了现有的几种信号源定位方法的统计特性。在这方面,我们特别考虑了渐近误差方差和异常估计的概率。分析的一个值得注意的结果是,所谓的PHAT时延估计器在一类基于互相关的时延估计器中被证明是最优的。为了验证我们在误差方差和离群概率上的结果,我们应用图像方法模拟了房间传递函数。
{"title":"Source localization in reverberant environments: modeling and statistical analysis","authors":"T. Gustafsson, B. Rao, M. Trivedi","doi":"10.1109/TSA.2003.818027","DOIUrl":"https://doi.org/10.1109/TSA.2003.818027","url":null,"abstract":"Room reverberation is typically the main obstacle for designing robust microphone-based source localization systems. The purpose of the paper is to analyze the achievable performance of acoustical source localization methods when room reverberation is present. To facilitate the analysis, we apply well known results from room acoustics to develop a simple but useful statistical model for the room transfer function. The properties of the statistical model are found to correlate well with results from real data measurements. The room transfer function model is further applied to analyze the statistical properties of some existing methods for source localization. In this respect we consider especially the asymptotic error variance and the probability of an anomalous estimate. A noteworthy outcome of the analysis is that the so-called PHAT time-delay estimator is shown to be optimal among a class of cross-correlation based time-delay estimators. To verify our results on the error variance and the outlier probability we apply the image method for simulation of the room transfer function.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"144 1","pages":"791-803"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73441635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
Robust time delay estimation exploiting redundancy among multiple microphones 利用多传声器冗余的鲁棒时延估计
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818025
Jingdong Chen, J. Benesty, Yiteng Huang
To find the position of an acoustic source in a room, typically, a set of relative delays among different microphone pairs needs to be determined. The generalized cross-correlation (GCC) method is the most popular to do so and is well explained in a landmark paper by Knapp and Carter. In this paper, the idea of cross-correlation coefficient between two random signals is generalized to the multichannel case by using the notion of spatial prediction. The multichannel spatial correlation matrix is then deduced and its properties are discussed. We then propose a new method based on the multichannel spatial correlation matrix for time delay estimation. It is shown that this new approach can take advantage of the redundancy when more than two microphones are available and this redundancy can help the estimator to better cope with noise and reverberation.
为了在房间中找到声源的位置,通常需要确定不同麦克风对之间的一组相对延迟。广义互相关(GCC)方法是最流行的方法,Knapp和Carter在一篇具有里程碑意义的论文中对此进行了很好的解释。本文利用空间预测的概念,将两个随机信号间互相关系数的思想推广到多信道情况。推导了多通道空间相关矩阵,并对其性质进行了讨论。然后提出了一种基于多通道空间相关矩阵的时延估计方法。结果表明,该方法可以利用两个以上传声器时的冗余性,使估计器更好地处理噪声和混响。
{"title":"Robust time delay estimation exploiting redundancy among multiple microphones","authors":"Jingdong Chen, J. Benesty, Yiteng Huang","doi":"10.1109/TSA.2003.818025","DOIUrl":"https://doi.org/10.1109/TSA.2003.818025","url":null,"abstract":"To find the position of an acoustic source in a room, typically, a set of relative delays among different microphone pairs needs to be determined. The generalized cross-correlation (GCC) method is the most popular to do so and is well explained in a landmark paper by Knapp and Carter. In this paper, the idea of cross-correlation coefficient between two random signals is generalized to the multichannel case by using the notion of spatial prediction. The multichannel spatial correlation matrix is then deduced and its properties are discussed. We then propose a new method based on the multichannel spatial correlation matrix for time delay estimation. It is shown that this new approach can take advantage of the redundancy when more than two microphones are available and this redundancy can help the estimator to better cope with noise and reverberation.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"1 1","pages":"549-557"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88299346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 158
Robust recognition of children's speech 对儿童言语的强大识别
Pub Date : 2003-11-01 DOI: 10.1109/TSA.2003.818026
A. Potamianos, Shrikanth S. Narayanan
Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children's speech. Through an analysis of age-related acoustic characteristics of children's speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on children's speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.
语言产生的发育变化在儿童产生的语音信号中引入了年龄依赖的频谱和时间变化。这些变化给儿童语言的自动识别带来了挑战。通过对自动语音识别(ASR)环境下儿童语音的年龄相关声学特征的分析,论证了频谱包络参数的频率缩放等效应。使用成人语音训练的声学模型进行识别实验,并对不同年龄的儿童的语音进行测试,结果清楚地表明,随着年龄的下降,识别能力会下降。平均而言,儿童语言的单词错误率是成人语言的2到5倍。本文报道了改善儿童ASR表现的各种技术。结合频率扭曲和模型变换的说话人归一化算法可以降低声音变异性,显著提高儿童说话人的ASR性能(在各种模型训练和测试条件下提高25-45%)。使用与年龄相关的声学模型进一步将单词错误率降低了10%。使用分段线性和音素相关的频率扭曲算法减少儿童声学特征空间的可变性的潜力也进行了研究。
{"title":"Robust recognition of children's speech","authors":"A. Potamianos, Shrikanth S. Narayanan","doi":"10.1109/TSA.2003.818026","DOIUrl":"https://doi.org/10.1109/TSA.2003.818026","url":null,"abstract":"Developmental changes in speech production introduce age-dependent spectral and temporal variability in the speech signal produced by children. Such variabilities pose challenges for robust automatic recognition of children's speech. Through an analysis of age-related acoustic characteristics of children's speech in the context of automatic speech recognition (ASR), effects such as frequency scaling of spectral envelope parameters are demonstrated. Recognition experiments using acoustic models trained from adult speech and tested against speech from children of various ages clearly show performance degradation with decreasing age. On average, the word error rates are two to five times worse for children speech than for adult speech. Various techniques for improving ASR performance on children's speech are reported. A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25-45% under various model training and testing conditions). The use of age-dependent acoustic models further reduces word error rate by 10%. The potential of using piece-wise linear and phoneme-dependent frequency warping algorithms for reducing the variability in the acoustic feature space of children is also investigated.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"7 1","pages":"603-616"},"PeriodicalIF":0.0,"publicationDate":"2003-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76983919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 213
期刊
IEEE Trans. Speech Audio Process.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1