首页 > 最新文献

IEEE Trans. Speech Audio Process.最新文献

英文 中文
The multimode transform predictive coding paradigm 多模变换预测编码范式
Pub Date : 2003-04-15 DOI: 10.1109/TSA.2003.809195
S. Ramprashad
Presented is a new coding paradigm, multimode transform predictive coding (MTPC), which combines speech and audio coding principles in a single coding structure. The paradigm is an adaptive coding paradigm which automatically adjusts how different coding modules are used based on the input signal. This allows MTPC coders to robustly handle a wider range of signals than single configuration (mode) transform predictive coding (TPC) designs. A wideband MTPC coder design targeting two-way communication applications and bitrates from 13 to 40 kbit/s is also presented. Subjective absolute category rating test results on speech, speech in noise and music demonstrate that the performance at 16, 24 and 32 kbit/s meets or exceeds that of ITU-T Rec. G.722 at 48, 56 and 64 kbit/s respectively for many coding conditions. Subjective Reference-ABx (R-ABx) tests are also included to show the potential advantages of the multimode coder over a single mode TPC coder. Finally, possible improvements in the MTPC coder design for applications such as broadcasting, which are less sensitive to delay and encoder complexity, are discussed.
提出了一种新的编码范式——多模变换预测编码(MTPC),它将语音和音频编码原理结合在一个编码结构中。该范式是一种基于输入信号自动调整不同编码模块使用方式的自适应编码范式。这使得MTPC编码器能够比单配置(模式)转换预测编码(TPC)设计健壮地处理更广泛的信号。提出了一种针对双向通信应用,码率为13 ~ 40kbit /s的宽带MTPC编码器设计。语音、噪声语音和音乐语音的主观绝对类别评级测试结果表明,在许多编码条件下,16、24和32 kbit/s的性能分别达到或超过ITU-T Rec. G.722在48、56和64 kbit/s的性能。主观参考- abx (R-ABx)测试也包括显示多模编码器比单模TPC编码器的潜在优势。最后,讨论了MTPC编码器设计中对延迟和编码器复杂性不太敏感的广播等应用的可能改进。
{"title":"The multimode transform predictive coding paradigm","authors":"S. Ramprashad","doi":"10.1109/TSA.2003.809195","DOIUrl":"https://doi.org/10.1109/TSA.2003.809195","url":null,"abstract":"Presented is a new coding paradigm, multimode transform predictive coding (MTPC), which combines speech and audio coding principles in a single coding structure. The paradigm is an adaptive coding paradigm which automatically adjusts how different coding modules are used based on the input signal. This allows MTPC coders to robustly handle a wider range of signals than single configuration (mode) transform predictive coding (TPC) designs. A wideband MTPC coder design targeting two-way communication applications and bitrates from 13 to 40 kbit/s is also presented. Subjective absolute category rating test results on speech, speech in noise and music demonstrate that the performance at 16, 24 and 32 kbit/s meets or exceeds that of ITU-T Rec. G.722 at 48, 56 and 64 kbit/s respectively for many coding conditions. Subjective Reference-ABx (R-ABx) tests are also included to show the potential advantages of the multimode coder over a single mode TPC coder. Finally, possible improvements in the MTPC coder design for applications such as broadcasting, which are less sensitive to delay and encoder complexity, are discussed.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"69 1","pages":"117-129"},"PeriodicalIF":0.0,"publicationDate":"2003-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80283918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Iterated partitioned block frequency-domain adaptive filtering for acoustic echo cancellation 声学回声消除的迭代分块频域自适应滤波
Pub Date : 2003-04-15 DOI: 10.1109/TSA.2003.809194
K. Eneman, M. Moonen
For high quality acoustic echo cancellation long echoes have to be suppressed. classical LMS-based adaptive filters are not attractive as they are suboptimal from a computational point of view. Multirate adaptive filters such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellers nowadays. In this paper the PBFDRAP is analyzed, which combines frequency-domain adaptive filtering with so-called "row action projection." Fast versions of the algorithm are derived and it is shown that the PBFDRAP outperforms the PBFDAF in a realistic echo cancellation setup.
为了实现高质量的回声消除,必须抑制长回声。经典的基于lms的自适应滤波器并不具有吸引力,因为从计算的角度来看它们是次优的。分块频域自适应滤波器(pbdaf)等多速率自适应滤波器是一种较好的替代方案,目前在商用回波消除器中得到了广泛的应用。本文分析了将频域自适应滤波与行作用投影相结合的pfdrap算法。推导了该算法的快速版本,结果表明,在实际的回波消除设置中,pbdrap优于pbdaf。
{"title":"Iterated partitioned block frequency-domain adaptive filtering for acoustic echo cancellation","authors":"K. Eneman, M. Moonen","doi":"10.1109/TSA.2003.809194","DOIUrl":"https://doi.org/10.1109/TSA.2003.809194","url":null,"abstract":"For high quality acoustic echo cancellation long echoes have to be suppressed. classical LMS-based adaptive filters are not attractive as they are suboptimal from a computational point of view. Multirate adaptive filters such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellers nowadays. In this paper the PBFDRAP is analyzed, which combines frequency-domain adaptive filtering with so-called \"row action projection.\" Fast versions of the algorithm are derived and it is shown that the PBFDRAP outperforms the PBFDAF in a realistic echo cancellation setup.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"9 1","pages":"143-158"},"PeriodicalIF":0.0,"publicationDate":"2003-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79552227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing feature extraction for speech recognition 优化语音识别特征提取
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805644
Chulhee Lee, Donghoon Hyun, E. Choi, Jinwook Go, Chungyong Lee
We propose a method to minimize the loss of information during the feature extraction stage in speech recognition by optimizing the parameters of the mel-cepstrum transformation, a transform which is widely used in speech recognition. Typically, the mel-cepstrum is obtained by critical band filters whose characteristics play an important role in converting a speech signal into a sequence of vectors. First, we analyze the performance of the mel-cepstrum by changing the parameters of the filters such as shape, center frequency, and bandwidth. Then we propose an algorithm to optimize the parameters of the filters using the simplex method. Experiments with Korean digit words show that the recognition rate improved by about 4-7%.
针对语音识别中广泛应用的梅尔倒谱变换,本文提出了一种通过优化梅尔倒谱变换参数来减少语音识别特征提取阶段信息丢失的方法。通常,mel-倒频谱是由临界带滤波器获得的,其特性在将语音信号转换为矢量序列中起着重要作用。首先,我们通过改变滤波器的形状、中心频率和带宽等参数来分析梅尔倒频谱的性能。然后提出了一种利用单纯形法优化滤波器参数的算法。对韩语数字词的实验表明,该方法的识别率提高了约4-7%。
{"title":"Optimizing feature extraction for speech recognition","authors":"Chulhee Lee, Donghoon Hyun, E. Choi, Jinwook Go, Chungyong Lee","doi":"10.1109/TSA.2002.805644","DOIUrl":"https://doi.org/10.1109/TSA.2002.805644","url":null,"abstract":"We propose a method to minimize the loss of information during the feature extraction stage in speech recognition by optimizing the parameters of the mel-cepstrum transformation, a transform which is widely used in speech recognition. Typically, the mel-cepstrum is obtained by critical band filters whose characteristics play an important role in converting a speech signal into a sequence of vectors. First, we analyze the performance of the mel-cepstrum by changing the parameters of the filters such as shape, center frequency, and bandwidth. Then we propose an algorithm to optimize the parameters of the filters using the simplex method. Experiments with Korean digit words show that the recognition rate improved by about 4-7%.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"45 1","pages":"80-87"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86923435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
A formant filtered physical model for wind instruments 管乐器的形成峰过滤物理模型
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807351
A. Nackaerts, B. Moor, R. Lauwereins
We report on our research concerning the calibration of physical models for sound synthesis. We combine waveguide physical modeling synthesis with formant filtering, by dividing the nonlinear description of the reed mechanism into a nonlinear part and an input-dependent linear filter. We elaborate on the calibration of the model and assess its performance by comparing it to a single-reed, cylindrical bore instrument, the clarinet.
我们报告了我们关于声音合成物理模型校准的研究。通过将簧片机构的非线性描述分为非线性部分和与输入相关的线性滤波器,将波导物理建模合成与形成峰滤波相结合。我们详细介绍了模型的校准,并通过将其与单簧管,圆柱孔乐器进行比较来评估其性能。
{"title":"A formant filtered physical model for wind instruments","authors":"A. Nackaerts, B. Moor, R. Lauwereins","doi":"10.1109/TSA.2002.807351","DOIUrl":"https://doi.org/10.1109/TSA.2002.807351","url":null,"abstract":"We report on our research concerning the calibration of physical models for sound synthesis. We combine waveguide physical modeling synthesis with formant filtering, by dividing the nonlinear description of the reed mechanism into a nonlinear part and an input-dependent linear filter. We elaborate on the calibration of the model and assess its performance by comparing it to a single-reed, cylindrical bore instrument, the clarinet.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"12 1","pages":"36-44"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74452203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A robust online secondary path modeling method with auxiliary noise power scheduling strategy and norm constraint manipulation 一种具有辅助噪声功率调度策略和范数约束的鲁棒在线二次路径建模方法
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2003.805643
Ming Zhang, H. Lan, W. Ser
In many practical cases for active noise control (ANC), the online secondary path modeling methods that use auxiliary noise are often applied. However, the auxiliary noise contributes to residual noise, and thus deteriorates the noise control performance of ANC systems. Moreover, a sudden and large change in the secondary path leads to easy divergence of the existing online secondary path modeling methods. To mitigate these problems, this paper proposes a new online secondary path modeling method with auxiliary noise power scheduling and adaptive filter norm manipulation. The auxiliary noise power is scheduled based on the convergence status of an ANC system with consideration of the variation of the primary noise. The purpose is to alleviate the increment of the residual noise due to the auxiliary noise. In addition, the norm manipulation is applied to adaptive filters in the ANC system. The objective is to avoid over-updates of adaptive filters due to the sudden large change in the secondary path and thus prevent the ANC system from diverging. Computer simulations show the effectiveness and robustness of the proposed method.
在主动噪声控制(ANC)的许多实际情况中,经常使用使用辅助噪声的在线二次路径建模方法。然而,辅助噪声会形成残余噪声,从而降低了自动控制系统的噪声控制性能。此外,副路径的突然和大的变化导致现有的在线副路径建模方法容易出现分歧。为了解决这些问题,本文提出了一种新的辅助噪声功率调度和自适应滤波范数处理的在线二次路径建模方法。在考虑主噪声变化的基础上,根据系统的收敛状态对辅助噪声功率进行调度。其目的是为了减轻由于辅助噪声引起的残余噪声的增量。此外,将范数操作应用于自适应滤波器中。这样做的目的是为了避免由于二次路径的突然大变化而导致自适应滤波器的过度更新,从而防止ANC系统发散。计算机仿真结果表明了该方法的有效性和鲁棒性。
{"title":"A robust online secondary path modeling method with auxiliary noise power scheduling strategy and norm constraint manipulation","authors":"Ming Zhang, H. Lan, W. Ser","doi":"10.1109/TSA.2003.805643","DOIUrl":"https://doi.org/10.1109/TSA.2003.805643","url":null,"abstract":"In many practical cases for active noise control (ANC), the online secondary path modeling methods that use auxiliary noise are often applied. However, the auxiliary noise contributes to residual noise, and thus deteriorates the noise control performance of ANC systems. Moreover, a sudden and large change in the secondary path leads to easy divergence of the existing online secondary path modeling methods. To mitigate these problems, this paper proposes a new online secondary path modeling method with auxiliary noise power scheduling and adaptive filter norm manipulation. The auxiliary noise power is scheduled based on the convergence status of an ANC system with consideration of the variation of the primary noise. The purpose is to alleviate the increment of the residual noise due to the auxiliary noise. In addition, the norm manipulation is applied to adaptive filters in the ANC system. The objective is to avoid over-updates of adaptive filters due to the sudden large change in the secondary path and thus prevent the ANC system from diverging. Computer simulations show the effectiveness and robustness of the proposed method.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"88 1","pages":"45-53"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81244473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Noise reduction and echo cancellation front-end for speech codecs 语音编解码器的降噪和回声消除前端
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807350
F. Basbug, K. Swaminathan, S. Nandkumar
We present an enhancement front-end for speech codecs, which consists of the integrated elements of noise reduction and echo cancellation. By including these elements, the front-end performs the task of mitigating the objectionable effects of the two major factors, i.e., noise and echo, which adversely affect the quality of most transmission systems, especially when low bit rate codecs are used. The use of this front-end is demonstrated with the 7.4 kbps IS-641 codec (enhanced full-rate standard for IS-136 systems). The integrated speech-processing unit has the advantage of utilizing the synergy among its components: the voice activity detector in the speech codec, the noise reduction, and the echo canceller. This synergy manifests itself both in the form of a reduction of the overall computational complexity by the use of a number of shared elements among the unit's various components, as well as an improved performance resulting from these components working together. The system displays high performance in both clean and noisy environments and it works well with low bit rate codecs.
提出了一种语音编解码器的增强前端,该前端由降噪和回波消除两部分组成。通过包含这些元素,前端执行减轻两个主要因素的不良影响的任务,即噪声和回波,这对大多数传输系统的质量产生不利影响,特别是当使用低比特率编解码器时。该前端使用7.4 kbps的is -641编解码器(is -136系统的增强全速率标准)进行演示。该集成语音处理单元的优点是利用了其组件之间的协同作用:语音编解码器中的语音活动检测器、降噪和回声消除器。这种协同作用体现在通过在单元的各个组件之间使用许多共享元素来降低总体计算复杂性的形式,以及这些组件一起工作所带来的性能改进。该系统在干净和嘈杂的环境中都显示出高性能,并且可以很好地使用低比特率编解码器。
{"title":"Noise reduction and echo cancellation front-end for speech codecs","authors":"F. Basbug, K. Swaminathan, S. Nandkumar","doi":"10.1109/TSA.2002.807350","DOIUrl":"https://doi.org/10.1109/TSA.2002.807350","url":null,"abstract":"We present an enhancement front-end for speech codecs, which consists of the integrated elements of noise reduction and echo cancellation. By including these elements, the front-end performs the task of mitigating the objectionable effects of the two major factors, i.e., noise and echo, which adversely affect the quality of most transmission systems, especially when low bit rate codecs are used. The use of this front-end is demonstrated with the 7.4 kbps IS-641 codec (enhanced full-rate standard for IS-136 systems). The integrated speech-processing unit has the advantage of utilizing the synergy among its components: the voice activity detector in the speech codec, the noise reduction, and the echo canceller. This synergy manifests itself both in the form of a reduction of the overall computational complexity by the use of a number of shared elements among the unit's various components, as well as an improved performance resulting from these components working together. The system displays high performance in both clean and noisy environments and it works well with low bit rate codecs.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"178 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79964723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
On the computation of the Kullback-Leibler measure for spectral distances 光谱距离的Kullback-Leibler测度的计算
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805641
R. Veldhuis, E. Klabbers
Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy.
针对线性预测编码(LPC)光谱,提出了精确和近似计算对称Kullback-Leibler(1998)测量光谱距离的有效算法。用光谱的极点给出了这一测量的解释。针对基于单元选择的语音合成中计算拼接代价的应用,对算法的精度和计算复杂度进行了评价。在相同的复杂性和存储要求下,精确方法在准确性方面更胜一筹。
{"title":"On the computation of the Kullback-Leibler measure for spectral distances","authors":"R. Veldhuis, E. Klabbers","doi":"10.1109/TSA.2002.805641","DOIUrl":"https://doi.org/10.1109/TSA.2002.805641","url":null,"abstract":"Efficient algorithms for the exact and approximate computation of the symmetrical Kullback-Leibler (1998) measure for spectral distances are presented for linear predictive coding (LPC) spectra. A interpretation of this measure is given in terms of the poles of the spectra. The performances of the algorithms in terms of accuracy and computational complexity are assessed for the application of computing concatenation costs in unit-selection-based speech synthesis. With the same complexity and storage requirements, the exact method is superior in terms of accuracy.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"53 39 1","pages":"100-103"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80481225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Discriminative training of natural language call routers 自然语言呼叫路由器的判别训练
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807352
H. Kuo, Chin-Hui Lee
This paper shows how discriminative training can significantly improve classifiers used in natural language processing, using as an example the task of natural language call routing, where callers are transferred to desired departments based on natural spoken responses to an open-ended "How may I direct your call?" prompt. With vector-based natural language call routing, callers are transferred using a routing matrix trained on statistics of occurrence of words and word sequences in a training corpus. By re-training the routing matrix parameters using a minimum classification error criterion, a relative error rate reduction of 10-30% was achieved on a banking task. Increased robustness was demonstrated in that with 10% rejection, the error rate was reduced by 40%. Discriminative training also improves portability; we were able to train call routers with the highest known performance using as input only text transcription of routed calls, without any human intervention or knowledge about what terms are important or irrelevant for the routing task. This strategy was validated with both the banking task and a more difficult task involving calls to operators in the UK. The proposed formulation is applicable to algorithms addressing a broad range of speech understanding, information retrieval, and topic identification problems.
本文展示了判别训练如何显著改善自然语言处理中使用的分类器,并以自然语言呼叫路由任务为例,在该任务中,根据对开放式“我如何为您转接电话?”提示的自然口头反应,将呼叫者转移到所需的部门。在基于向量的自然语言呼叫路由中,使用基于训练语料库中单词和单词序列出现统计数据训练的路由矩阵来转移呼叫者。通过使用最小分类误差标准对路由矩阵参数进行重新训练,将银行任务的相对错误率降低了10-30%。增强的鲁棒性表明,10%的拒绝,错误率降低了40%。判别性训练也提高了可移植性;我们能够训练具有最高已知性能的呼叫路由器,只使用路由呼叫的文本转录作为输入,而不需要任何人为干预或了解哪些术语对路由任务重要或无关。这一策略在银行业务任务和一项更困难的任务中得到了验证,这项任务涉及给英国的运营商打电话。所提出的公式适用于解决广泛的语音理解、信息检索和主题识别问题的算法。
{"title":"Discriminative training of natural language call routers","authors":"H. Kuo, Chin-Hui Lee","doi":"10.1109/TSA.2002.807352","DOIUrl":"https://doi.org/10.1109/TSA.2002.807352","url":null,"abstract":"This paper shows how discriminative training can significantly improve classifiers used in natural language processing, using as an example the task of natural language call routing, where callers are transferred to desired departments based on natural spoken responses to an open-ended \"How may I direct your call?\" prompt. With vector-based natural language call routing, callers are transferred using a routing matrix trained on statistics of occurrence of words and word sequences in a training corpus. By re-training the routing matrix parameters using a minimum classification error criterion, a relative error rate reduction of 10-30% was achieved on a banking task. Increased robustness was demonstrated in that with 10% rejection, the error rate was reduced by 40%. Discriminative training also improves portability; we were able to train call routers with the highest known performance using as input only text transcription of routed calls, without any human intervention or knowledge about what terms are important or irrelevant for the routing task. This strategy was validated with both the banking task and a more difficult task involving calls to operators in the UK. The proposed formulation is applicable to algorithms addressing a broad range of speech understanding, information retrieval, and topic identification problems.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"73 1","pages":"24-35"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82043531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Filter bank design for subband adaptive microphone arrays 子带自适应麦克风阵列滤波器组设计
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.807353
Jan Mark de Haan, N. Grbic, I. Claesson, S. Nordholm
This paper presents a new method for the design of oversampled uniform DFT-filter banks for the special application of subband adaptive beamforming with microphone arrays. Since array applications rely on the fact that different source positions give rise to different signal delays, a beamformer alters the phase information of the signals. This in turn leads to signal degradations when perfect reconstruction filter banks are used for the subband decomposition and reconstruction. The objective of the filter bank design is to minimize the magnitude of all aliasing components individually, such that aliasing distortion is minimized although phase alterations occur in the subbands. The proposed method is evaluated in a car hands-free mobile telephony environment and the results show that the proposed method offers better performance regarding suppression levels of disturbing signals and much less distortion to the source speech.
针对麦克风阵列子带自适应波束形成的特殊应用,提出了一种过采样均匀dft滤波器组设计的新方法。由于阵列应用依赖于这样一个事实,即不同的源位置会产生不同的信号延迟,波束形成器改变信号的相位信息。当使用完美的重构滤波器组进行子带分解和重构时,这反过来又会导致信号退化。滤波器组设计的目标是最小化所有混叠分量的幅度,这样虽然子带中发生相位变化,但混叠失真被最小化。在汽车免提移动电话环境中对该方法进行了评估,结果表明该方法在抑制干扰信号方面具有更好的性能,并且对源语音的失真更小。
{"title":"Filter bank design for subband adaptive microphone arrays","authors":"Jan Mark de Haan, N. Grbic, I. Claesson, S. Nordholm","doi":"10.1109/TSA.2002.807353","DOIUrl":"https://doi.org/10.1109/TSA.2002.807353","url":null,"abstract":"This paper presents a new method for the design of oversampled uniform DFT-filter banks for the special application of subband adaptive beamforming with microphone arrays. Since array applications rely on the fact that different source positions give rise to different signal delays, a beamformer alters the phase information of the signals. This in turn leads to signal degradations when perfect reconstruction filter banks are used for the subband decomposition and reconstruction. The objective of the filter bank design is to minimize the magnitude of all aliasing components individually, such that aliasing distortion is minimized although phase alterations occur in the subbands. The proposed method is evaluated in a car hands-free mobile telephony environment and the results show that the proposed method offers better performance regarding suppression levels of disturbing signals and much less distortion to the source speech.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"3 1","pages":"14-23"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90146045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Linear regression based Bayesian predictive classification for speech recognition 基于线性回归的贝叶斯预测分类语音识别
Pub Date : 2003-02-19 DOI: 10.1109/TSA.2002.805640
Jen-Tzung Chien
The uncertainty in parameter estimation due to the adverse environments deteriorates the classification performance for speech recognition. It becomes crucial to incorporate the parameter uncertainty into decision so that the classification robustness can be assured. We propose a novel linear regression based Bayesian predictive classification (LRBPC) for robust speech recognition. This framework is constructed under the paradigm of linear regression adaptation of speech hidden Markov models (HMMs). Because the regression mapping between HMMs and adaptation data is ill posed, we properly characterize the uncertainty of regression parameters using a joint Gaussian distribution . A closed-form predictive distribution can be derived to set up the LRBPC decision for speech recognition. Such decision is robust compared to the plug-in maximum a posteriori (MAP) decision adopted in the maximum likelihood linear regression (MLLR) and MAP linear regression (MAPLR). Since the specified distribution belongs to the conjugate prior family, the evolutionary hyperparameters are established. With the statistically rich hyperparameters, the LRBPC achieves decision robustness. In the experiments, we find that LRBPC decision in cases of general linear regression as well as single variable linear regression attains significantly better recognition performance than MLLR and MAPLR adaptation.
由于不利环境导致的参数估计的不确定性降低了语音识别的分类性能。为了保证分类的鲁棒性,在决策中考虑参数的不确定性变得至关重要。我们提出了一种基于线性回归的贝叶斯预测分类(LRBPC)用于鲁棒语音识别。该框架是在语音隐马尔可夫模型(hmm)的线性回归自适应范式下构建的。由于hmm与自适应数据之间的回归映射是病态的,我们使用联合高斯分布来适当地表征回归参数的不确定性。可以推导出一个封闭的预测分布来建立语音识别的LRBPC决策。与最大似然线性回归(MLLR)和MAP线性回归(MAPLR)中采用的插件最大后验(MAP)决策相比,该决策具有鲁棒性。由于给定分布属于共轭先验族,建立了演化超参数。利用统计上丰富的超参数,LRBPC实现了决策鲁棒性。在实验中,我们发现LRBPC决策在一般线性回归和单变量线性回归情况下的识别性能明显优于MLLR和MAPLR自适应。
{"title":"Linear regression based Bayesian predictive classification for speech recognition","authors":"Jen-Tzung Chien","doi":"10.1109/TSA.2002.805640","DOIUrl":"https://doi.org/10.1109/TSA.2002.805640","url":null,"abstract":"The uncertainty in parameter estimation due to the adverse environments deteriorates the classification performance for speech recognition. It becomes crucial to incorporate the parameter uncertainty into decision so that the classification robustness can be assured. We propose a novel linear regression based Bayesian predictive classification (LRBPC) for robust speech recognition. This framework is constructed under the paradigm of linear regression adaptation of speech hidden Markov models (HMMs). Because the regression mapping between HMMs and adaptation data is ill posed, we properly characterize the uncertainty of regression parameters using a joint Gaussian distribution . A closed-form predictive distribution can be derived to set up the LRBPC decision for speech recognition. Such decision is robust compared to the plug-in maximum a posteriori (MAP) decision adopted in the maximum likelihood linear regression (MLLR) and MAP linear regression (MAPLR). Since the specified distribution belongs to the conjugate prior family, the evolutionary hyperparameters are established. With the statistically rich hyperparameters, the LRBPC achieves decision robustness. In the experiments, we find that LRBPC decision in cases of general linear regression as well as single variable linear regression attains significantly better recognition performance than MLLR and MAPLR adaptation.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"63 1","pages":"70-79"},"PeriodicalIF":0.0,"publicationDate":"2003-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90590519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
IEEE Trans. Speech Audio Process.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1