Blind deconvolution is fundamental in signal processing applications and, in particular, the single channel case remains a challenging and formidable problem. This paper considers single channel blind deconvolution in the case where the degraded observed signal may be modeled as the convolution of a nonstationary source signal with a stationary distortion operator. The important feature that the source is nonstationary while the channel is stationary facilitates the unambiguous identification of either the source or channel, and deconvolution is possible, whereas if the source and channel are both stationary, identification is ambiguous. The parameters for the channel are estimated by modeling the source as a time-varyng AR process and the distortion by an all-pole filter, and using the Bayesian framework for parameter estimation. This estimate can then be used to deconvolve the observed signal. In contrast to the classical histogram approach for estimating the channel poles, where the technique merely relies on the fact that the channel is actually stationary rather than modeling it as so, the proposed Bayesian method does take account for the channel's stationarity in the model and, consequently, is more robust. The properties of this model are investigated, and the advantage of utilizing the nonstationarity of a system rather than considering it as a curse is discussed.
{"title":"Blind single channel deconvolution using nonstationary signal processing","authors":"J. Hopgood, P. Rayner","doi":"10.1109/TSA.2003.815522","DOIUrl":"https://doi.org/10.1109/TSA.2003.815522","url":null,"abstract":"Blind deconvolution is fundamental in signal processing applications and, in particular, the single channel case remains a challenging and formidable problem. This paper considers single channel blind deconvolution in the case where the degraded observed signal may be modeled as the convolution of a nonstationary source signal with a stationary distortion operator. The important feature that the source is nonstationary while the channel is stationary facilitates the unambiguous identification of either the source or channel, and deconvolution is possible, whereas if the source and channel are both stationary, identification is ambiguous. The parameters for the channel are estimated by modeling the source as a time-varyng AR process and the distortion by an all-pole filter, and using the Bayesian framework for parameter estimation. This estimate can then be used to deconvolve the observed signal. In contrast to the classical histogram approach for estimating the channel poles, where the technique merely relies on the fact that the channel is actually stationary rather than modeling it as so, the proposed Bayesian method does take account for the channel's stationarity in the model and, consequently, is more robust. The properties of this model are investigated, and the advantage of utilizing the nonstationarity of a system rather than considering it as a curse is discussed.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"469 1","pages":"476-488"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77508201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose to use neighborhood information in model space to perform utterance verification (UV). At first, we present a nested-neighborhood structure for each underlying model in model space and assume the underlying model's competing models sit in one of these neighborhoods, which is used to model alternative hypothesis in UV. Bayes factors (BF) is first introduced to UV and used as a major tool to calculate confidence measures based on the above idea. Experimental results in the Bell Labs communicator system show that the new method has dramatically improved verification performance when verifying correct words against mis-recognized words in the recognizer's output, relatively more than 20% reduction in equal error rate (EER) when comparing with the standard approach based on likelihood ratio testing and anti-models.
{"title":"A new approach to utterance verification based on neighborhood information in model space","authors":"Hui Jiang, Chin-Hui Lee","doi":"10.1109/TSA.2003.815821","DOIUrl":"https://doi.org/10.1109/TSA.2003.815821","url":null,"abstract":"We propose to use neighborhood information in model space to perform utterance verification (UV). At first, we present a nested-neighborhood structure for each underlying model in model space and assume the underlying model's competing models sit in one of these neighborhoods, which is used to model alternative hypothesis in UV. Bayes factors (BF) is first introduced to UV and used as a major tool to calculate confidence measures based on the above idea. Experimental results in the Bell Labs communicator system show that the new method has dramatically improved verification performance when verifying correct words against mis-recognized words in the recognizer's output, relatively more than 20% reduction in equal error rate (EER) when comparing with the standard approach based on likelihood ratio testing and anti-models.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"6 1","pages":"425-434"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87640258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new perceptually motivated approach is proposed for enhancement of speech corrupted by colored noise. The proposed approach takes into account the frequency masking properties of the human auditory system and reduces the perceptual effect of the residual noise. This new perceptual method is incorporated into a frequency-domain speech enhancement method and a subspace-based speech enhancement method. A better power spectrum/autocorrelation function estimator was also developed to improve the performance of the proposed algorithms. Objective measures and informal listening tests demonstrated significant improvements over other methods when tested with TIMIT sentences corrupted by various types of colored noise.
{"title":"A perceptually motivated approach for speech enhancement","authors":"Y. Hu, P. Loizou","doi":"10.1109/TSA.2003.815936","DOIUrl":"https://doi.org/10.1109/TSA.2003.815936","url":null,"abstract":"A new perceptually motivated approach is proposed for enhancement of speech corrupted by colored noise. The proposed approach takes into account the frequency masking properties of the human auditory system and reduces the perceptual effect of the residual noise. This new perceptual method is incorporated into a frequency-domain speech enhancement method and a subspace-based speech enhancement method. A better power spectrum/autocorrelation function estimator was also developed to improve the performance of the proposed algorithms. Objective measures and informal listening tests demonstrated significant improvements over other methods when tested with TIMIT sentences corrupted by various types of colored noise.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"171 1","pages":"457-465"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79442675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of separation of audio sources recorded in a real world situation is well established in modern literature. A method to solve this problem is blind source separation (BSS) using independent component analysis (ICA). The recording environment is usually modeled as convolutive. Previous research on ICA of instantaneous mixtures provided solid background for the separation of convolved mixtures. The authors revise current approaches on the subject and propose a fast frequency domain ICA framework, providing a solution for the apparent permutation problem encountered in these methods.
{"title":"Audio source separation of convolutive mixtures","authors":"N. Mitianoudis, M. Davies","doi":"10.1109/TSA.2003.815820","DOIUrl":"https://doi.org/10.1109/TSA.2003.815820","url":null,"abstract":"The problem of separation of audio sources recorded in a real world situation is well established in modern literature. A method to solve this problem is blind source separation (BSS) using independent component analysis (ICA). The recording environment is usually modeled as convolutive. Previous research on ICA of instantaneous mixtures provided solid background for the separation of convolved mixtures. The authors revise current approaches on the subject and propose a fast frequency domain ICA framework, providing a solution for the apparent permutation problem encountered in these methods.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"59 5","pages":"489-497"},"PeriodicalIF":0.0,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91432369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of adapting acoustic models of native English speech to nonnative speakers is addressed from a perspective of adaptive model complexity selection. The goal is to select model complexity dynamically for each nonnative talker so as to optimize the balance between model robustness to pronunciation variations and model detailedness for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed to enable reliable complexity selection when adaptation data are sparse, where expectation of log-likelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model complexity is selected to maximize EL. The MEL based complexity selection is further combined with MLLR (maximum likelihood linear regression) to enable adaptation of both complexity and parameters of acoustic models. Experiments were performed on WSJ1 data of speakers with a wide range of foreign accents. Results show that the MEL based complexity selection is feasible when using as little as one adaptation utterance, and it is able to select dynamically the proper model complexity as the adaptation data increases. Compared with the standard MLLR, the MEL+MLLR method leads to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.
{"title":"Fast model selection based speaker adaptation for nonnative speech","authors":"Xiaodong He, Yunxin Zhao","doi":"10.1109/TSA.2003.814379","DOIUrl":"https://doi.org/10.1109/TSA.2003.814379","url":null,"abstract":"The problem of adapting acoustic models of native English speech to nonnative speakers is addressed from a perspective of adaptive model complexity selection. The goal is to select model complexity dynamically for each nonnative talker so as to optimize the balance between model robustness to pronunciation variations and model detailedness for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed to enable reliable complexity selection when adaptation data are sparse, where expectation of log-likelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model complexity is selected to maximize EL. The MEL based complexity selection is further combined with MLLR (maximum likelihood linear regression) to enable adaptation of both complexity and parameters of acoustic models. Experiments were performed on WSJ1 data of speakers with a wide range of foreign accents. Results show that the MEL based complexity selection is feasible when using as little as one adaptation utterance, and it is able to select dynamically the proper model complexity as the adaptation data increases. Compared with the standard MLLR, the MEL+MLLR method leads to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"17 1 1","pages":"298-307"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82933228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new duration modeling approach for Mandarin speech is proposed. It explicitly takes several major affecting factors, such as multiplicative companding factors (CFs), and estimates all model parameters by an EM algorithm. The three basic Tone 3 patterns (i.e., full tone, half tone and sandhi tone) are also properly considered using three different CFs to separate how they affect syllable duration. Experimental results show that the variance of the syllable duration is greatly reduced from 180.17 to 2.52 frame/sup 2/ (1 frame = 5 ms) by the syllable duration modeling to eliminate effects from those affecting factors. Moreover, the estimated CFs of those affecting factors agree well with our prior linguistic knowledge. Two extensions of the duration modeling method are also performed. One is the use of the same technique to model initial and final durations. The other is to replace the multiplicative model with an additive one. Lastly, a preliminary study of applying the proposed model to predict syllable duration for TTS (text-to-speech) is also performed. Experimental results show that it outperforms the conventional regressive prediction method.
{"title":"A new duration modeling approach for Mandarin speech","authors":"Sin-Horng Chen, Wen-Hsing Lai, Yih-Ru Wang","doi":"10.1109/TSA.2003.814377","DOIUrl":"https://doi.org/10.1109/TSA.2003.814377","url":null,"abstract":"A new duration modeling approach for Mandarin speech is proposed. It explicitly takes several major affecting factors, such as multiplicative companding factors (CFs), and estimates all model parameters by an EM algorithm. The three basic Tone 3 patterns (i.e., full tone, half tone and sandhi tone) are also properly considered using three different CFs to separate how they affect syllable duration. Experimental results show that the variance of the syllable duration is greatly reduced from 180.17 to 2.52 frame/sup 2/ (1 frame = 5 ms) by the syllable duration modeling to eliminate effects from those affecting factors. Moreover, the estimated CFs of those affecting factors agree well with our prior linguistic knowledge. Two extensions of the duration modeling method are also performed. One is the use of the same technique to model initial and final durations. The other is to replace the multiplicative model with an additive one. Lastly, a preliminary study of applying the proposed model to predict syllable duration for TTS (text-to-speech) is also performed. Experimental results show that it outperforms the conventional regressive prediction method.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"74 1","pages":"308-320"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83361525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new quality-scalable high-fidelity multichannel audio compression algorithm based on MPEG-2 advanced audio coding (AAC) is presented. The Karhunen-Loeve transform (KLT) is applied to multichannel audio signals in the preprocessing stage to remove interchannel redundancy. Then, signals in decorrelated channels are compressed by a modified AAC main profile encoder. Finally, a channel transmission control mechanism is used to re-organize the bitstream so that the multichannel audio bitstream has a quality scalable property when it is transmitted over a heterogeneous network. Experimental results show that, compared with AAC, the proposed algorithm achieves a better performance while maintaining a similar computational complexity at the regular bit rate of 64 kbit/sec/ch. When the bitstream is transmitted to narrowband end users at a lower bit rate, packets in some channels can be dropped, and slightly degraded, yet full-channel, audio can still be reconstructed in a reasonable fashion without any additional computational cost.
{"title":"High-fidelity multichannel audio coding with Karhunen-Loeve transform","authors":"Dai Yang, H. Ai, C. Kyriakakis, C.-C. Jay Kuo","doi":"10.1109/TSA.2003.814375","DOIUrl":"https://doi.org/10.1109/TSA.2003.814375","url":null,"abstract":"A new quality-scalable high-fidelity multichannel audio compression algorithm based on MPEG-2 advanced audio coding (AAC) is presented. The Karhunen-Loeve transform (KLT) is applied to multichannel audio signals in the preprocessing stage to remove interchannel redundancy. Then, signals in decorrelated channels are compressed by a modified AAC main profile encoder. Finally, a channel transmission control mechanism is used to re-organize the bitstream so that the multichannel audio bitstream has a quality scalable property when it is transmitted over a heterogeneous network. Experimental results show that, compared with AAC, the proposed algorithm achieves a better performance while maintaining a similar computational complexity at the regular bit rate of 64 kbit/sec/ch. When the bitstream is transmitted to narrowband end users at a lower bit rate, packets in some channels can be dropped, and slightly degraded, yet full-channel, audio can still be reconstructed in a reasonable fashion without any additional computational cost.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"29 1","pages":"365-380"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77844797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is essential to incorporate perceptual characteristics of human hearing in modern speech/audio coding systems. However, the focus has been confined only to the magnitude information of speech, and little attention has been paid to phase information. A quantitative study on the characteristics of human phase perception is presented and a novel method is proposed for the quantization of phase information in speech/audio signals. First, the just-noticeable difference (JND) of phase for each harmonic in flat-spectrum periodic tones is measured for several different fundamental frequencies. Then, a mathematical model of JND is established, based on measured data, to form a weighting function for phase quantization. Since the proposed weighting function is derived from psychoacoustic measurements, it provides a novel quantization method by which more bits are assigned to perceptually important phase components at the sacrifice of less important ones, resulting in a quantized signal perceptually closer to the original one. Experimental results on five vowel speech signals demonstrate that the proposed weighting function is very effective for the quantization of phase information.
{"title":"Perceptual phase quantization of speech","authors":"Doh-Suk Kim","doi":"10.1109/TSA.2003.814409","DOIUrl":"https://doi.org/10.1109/TSA.2003.814409","url":null,"abstract":"It is essential to incorporate perceptual characteristics of human hearing in modern speech/audio coding systems. However, the focus has been confined only to the magnitude information of speech, and little attention has been paid to phase information. A quantitative study on the characteristics of human phase perception is presented and a novel method is proposed for the quantization of phase information in speech/audio signals. First, the just-noticeable difference (JND) of phase for each harmonic in flat-spectrum periodic tones is measured for several different fundamental frequencies. Then, a mathematical model of JND is established, based on measured data, to form a weighting function for phase quantization. Since the proposed weighting function is derived from psychoacoustic measurements, it provides a novel quantization method by which more bits are assigned to perceptually important phase components at the sacrifice of less important ones, resulting in a quantized signal perceptually closer to the original one. Experimental results on five vowel speech signals demonstrate that the proposed weighting function is very effective for the quantization of phase information.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"3 1","pages":"355-364"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84935583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. A nonunitary transform, based on the simultaneous diagonalization of the clean speech and noise covariance matrices, is used to project the noisy signal onto a signal-plus-noise subspace and a noise subspace. The clean signal is estimated by nulling the signal components in the noise subspace and retaining the components in the signal subspace. The applied transform has built-in prewhitening and can therefore be used in general for colored noise. The proposed approach is shown to be a generalization of the approach proposed by Y. Ephraim and H.L. Van Trees (see ibid., vol.3, p.251-66, 1995) for white noise. Two estimators are derived based on the nonunitary transform, one based on time-domain constraints and one based on spectral domain constraints. Objective and subjective measures demonstrate improvements over other subspace-based methods when tested with TIMIT sentences corrupted with speech-shaped noise and multi-talker babble.
提出了一种广义子空间方法来增强有色噪声干扰下的语音。基于清洁语音和噪声协方差矩阵的同时对角化,采用非酉变换将噪声信号投影到信号加噪声子空间和噪声子空间上。通过消除噪声子空间中的信号分量并保留信号子空间中的分量来估计干净信号。应用的变换有内置的预白,因此可以用于有色噪声。所提出的方法被证明是Y. Ephraim和H.L. Van Trees提出的白噪声方法的推广(同上,第3卷,第251-66页,1995)。推导了基于非酉变换的两个估计量,一个基于时域约束,一个基于谱域约束。在使用被语音形状噪声和多说话者胡言乱语损坏的TIMIT句子进行测试时,客观和主观测量都比其他基于子空间的方法有所改进。
{"title":"A generalized subspace approach for enhancing speech corrupted by colored noise","authors":"Y. Hu, P. Loizou","doi":"10.1109/TSA.2003.814458","DOIUrl":"https://doi.org/10.1109/TSA.2003.814458","url":null,"abstract":"A generalized subspace approach is proposed for enhancement of speech corrupted by colored noise. A nonunitary transform, based on the simultaneous diagonalization of the clean speech and noise covariance matrices, is used to project the noisy signal onto a signal-plus-noise subspace and a noise subspace. The clean signal is estimated by nulling the signal components in the noise subspace and retaining the components in the signal subspace. The applied transform has built-in prewhitening and can therefore be used in general for colored noise. The proposed approach is shown to be a generalization of the approach proposed by Y. Ephraim and H.L. Van Trees (see ibid., vol.3, p.251-66, 1995) for white noise. Two estimators are derived based on the nonunitary transform, one based on time-domain constraints and one based on spectral domain constraints. Objective and subjective measures demonstrate improvements over other subspace-based methods when tested with TIMIT sentences corrupted with speech-shaped noise and multi-talker babble.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"46 1","pages":"334-341"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86859736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joint structures for audio coding and echo cancellation are investigated, utilizing standard audio coders. Two types of audio coders are considered, coders based on cosine modulated filterbanks and coders based on the modified discrete cosine transform (MDCT). For the first coder type, two methods for combining such a coder with a subband echo canceller are proposed. The two methods are: a modified audio coder filterbank that is suitable for echo cancellation but still generates the same final decomposition as the standard audio coder filterbank, and another that converts subband signals between an audio coder filterbank and a filterbank designed for echo cancellation. For the MDCT based audio coder, a joint structure with a frequency-domain adaptive filter based echo canceller is considered. Computational complexity and transmission delay for the different coder/echo canceller combinations are presented. Convergence properties of the proposed echo canceller structures are shown using simulations with real-life recorded speech.
{"title":"Joint filterbanks for echo cancellation and audio coding","authors":"P. Eneroth","doi":"10.1109/TSA.2003.814798","DOIUrl":"https://doi.org/10.1109/TSA.2003.814798","url":null,"abstract":"Joint structures for audio coding and echo cancellation are investigated, utilizing standard audio coders. Two types of audio coders are considered, coders based on cosine modulated filterbanks and coders based on the modified discrete cosine transform (MDCT). For the first coder type, two methods for combining such a coder with a subband echo canceller are proposed. The two methods are: a modified audio coder filterbank that is suitable for echo cancellation but still generates the same final decomposition as the standard audio coder filterbank, and another that converts subband signals between an audio coder filterbank and a filterbank designed for echo cancellation. For the MDCT based audio coder, a joint structure with a frequency-domain adaptive filter based echo canceller is considered. Computational complexity and transmission delay for the different coder/echo canceller combinations are presented. Convergence properties of the proposed echo canceller structures are shown using simulations with real-life recorded speech.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"45 1","pages":"342-354"},"PeriodicalIF":0.0,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75097897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}