Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.862032
T. Nagai, M. Ikehara, M. Kaneko, A. Kurematsu
In this paper, generalized linear phase lapped orthogonal transforms with unequal length basis functions (GULLOT) are considered. The length of each basis of the proposed GULLOT can be different from each other, while all the bases of the conventional GenLOT are of equal length. In order to apply the GULLOT to subband image coding, we also investigate the size-limited structure to process the finite length signal which is important in practice.
{"title":"Generalized unequal length lapped orthogonal transform for subband image coding","authors":"T. Nagai, M. Ikehara, M. Kaneko, A. Kurematsu","doi":"10.1109/ICASSP.2000.862032","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.862032","url":null,"abstract":"In this paper, generalized linear phase lapped orthogonal transforms with unequal length basis functions (GULLOT) are considered. The length of each basis of the proposed GULLOT can be different from each other, while all the bases of the conventional GenLOT are of equal length. In order to apply the GULLOT to subband image coding, we also investigate the size-limited structure to process the finite length signal which is important in practice.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130286468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861887
I. Selesnick, L. Sendur
This paper considers the design and application of wavelet tight frames based on iterated oversampled filter banks. The greater design freedom available makes possible the construction of wavelets with a high degree of smoothness, in comparison with orthonormal wavelet bases. Grobner bases are used to obtain the solutions to the nonlinear design equations. Following the dual-tree DWT of Kingsbury (see Proceedings of the Eighth IEEE DSP Workshop, Utah, 1998, and Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Phoenix, 1999), one goal is to keep the redundancy-factor bounded by 2, instead of allowing it to grow as it does for the undecimated DWT (which is exactly shift-invariant). For the tight frame presented here, optimal-tree based denoising algorithms can be directly applied.
本文研究了基于迭代过采样滤波器组的小波紧框架的设计与应用。与标准正交小波基相比,更大的设计自由度使得具有高度平滑度的小波的构造成为可能。采用Grobner基求解非线性设计方程。继金斯伯里的双树DWT(见第八届IEEE DSP研讨会论文集,犹他州,1998,和Proc. IEEE Int.)。相依Acoust。, Speech, Signal Processing (ICASSP), Phoenix, 1999),其中一个目标是保持冗余因子以2为界,而不是允许它像未消去DWT(它完全是位移不变的)那样增长。对于这里呈现的紧凑框架,可以直接应用基于最优树的去噪算法。
{"title":"Smooth wavelet frames with application to denoising","authors":"I. Selesnick, L. Sendur","doi":"10.1109/ICASSP.2000.861887","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861887","url":null,"abstract":"This paper considers the design and application of wavelet tight frames based on iterated oversampled filter banks. The greater design freedom available makes possible the construction of wavelets with a high degree of smoothness, in comparison with orthonormal wavelet bases. Grobner bases are used to obtain the solutions to the nonlinear design equations. Following the dual-tree DWT of Kingsbury (see Proceedings of the Eighth IEEE DSP Workshop, Utah, 1998, and Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Phoenix, 1999), one goal is to keep the redundancy-factor bounded by 2, instead of allowing it to grow as it does for the undecimated DWT (which is exactly shift-invariant). For the tight frame presented here, optimal-tree based denoising algorithms can be directly applied.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"380 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134076578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.860085
U. Walther, G. Fettweis
The new wireless communication standard UMTS applies an advanced dual-mode channel coding scheme. We investigate the feasibility of implementing the algorithm on a digital signal processor device and the implication upon the processor architecture. Starting with a base architecture which allows for scalability and customization we derive new system parameters and compare the total device to ASIC solutions.
{"title":"DSP implementation issues for UMTS-channel coding","authors":"U. Walther, G. Fettweis","doi":"10.1109/ICASSP.2000.860085","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860085","url":null,"abstract":"The new wireless communication standard UMTS applies an advanced dual-mode channel coding scheme. We investigate the feasibility of implementing the algorithm on a digital signal processor device and the implication upon the processor architecture. Starting with a base architecture which allows for scalability and customization we derive new system parameters and compare the total device to ASIC solutions.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134127720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.862116
G. Miet, A. Gerrits, J. Valière
This paper describes a system that generates a low-band signal (100-300 Hz) from a telephone-band (300-3400 Hz) speech signal to obtain an extended-band speech signal (100-3400 Hz). The low-band increases signal naturalness and listening comfort. This system is applied at the receiving end such that compatibility with all current telephone networks is maintained. The described technique splits the telephone-band speech signal into a spectral envelope and a short-term residual. The spectral envelope and the residual are extended separately and recombined to create an extended band signal. This system is evaluated by listening tests and distortion measurement.
{"title":"Low-band extension of telephone-band speech","authors":"G. Miet, A. Gerrits, J. Valière","doi":"10.1109/ICASSP.2000.862116","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.862116","url":null,"abstract":"This paper describes a system that generates a low-band signal (100-300 Hz) from a telephone-band (300-3400 Hz) speech signal to obtain an extended-band speech signal (100-3400 Hz). The low-band increases signal naturalness and listening comfort. This system is applied at the receiving end such that compatibility with all current telephone networks is maintained. The described technique splits the telephone-band speech signal into a spectral envelope and a short-term residual. The spectral envelope and the residual are extended separately and recombined to create an extended band signal. This system is evaluated by listening tests and distortion measurement.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134235338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861077
S. Cherif, M. Alouane, Mériem Jaïdane
In this paper, a new class of blind algorithms designed for decision feedback equalization of time varying channels, is proposed. We consider Markovian time variations of the impulse response of the channel as in radio mobile communications. The main idea is to modify classical blind algorithms (decision-directed, constant modulus algorithm,...) in order to give them self-adaptive knowledge of the channel non-stationarity. Simulations show that the proposed algorithms non-stationary DD and non-stationary CMA present better tracking capacity than the classical ones. Hence, they are able to improve the bit error rate especially for severe propagation conditions.
{"title":"Design of blind decision feedback equalizers for Markovian time varying channels","authors":"S. Cherif, M. Alouane, Mériem Jaïdane","doi":"10.1109/ICASSP.2000.861077","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861077","url":null,"abstract":"In this paper, a new class of blind algorithms designed for decision feedback equalization of time varying channels, is proposed. We consider Markovian time variations of the impulse response of the channel as in radio mobile communications. The main idea is to modify classical blind algorithms (decision-directed, constant modulus algorithm,...) in order to give them self-adaptive knowledge of the channel non-stationarity. Simulations show that the proposed algorithms non-stationary DD and non-stationary CMA present better tracking capacity than the classical ones. Hence, they are able to improve the bit error rate especially for severe propagation conditions.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134360519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.862068
H. Schramm, X. Aubert
The paper describes the improved handling of multiple pronunciations achieved in the Philips research decoder by (1) incorporating some prior information about their distributions and (2) combining the acoustic contributions of concurrent alternate word hypotheses. Starting from a baseline system where multiple pronunciations are treated as word copies without priors, an extension of the usual Viterbi decoding is presented which integrates unigram priors in a weighted sum of acoustic probabilities. Several approximations are discussed leading to new decoding aspects. Experimental results are presented for US broadcast news recordings. It is shown that the use of unigram priors has a clear positive impact on both error rate and decoding cost while the sum over multiple pronunciation contributions brings another small improvement. An overall 4% reduction of the error rate is achieved on the HUB-4 evaluation sets of 97 and 98.
{"title":"Efficient integration of multiple pronunciations in a large vocabulary decoder","authors":"H. Schramm, X. Aubert","doi":"10.1109/ICASSP.2000.862068","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.862068","url":null,"abstract":"The paper describes the improved handling of multiple pronunciations achieved in the Philips research decoder by (1) incorporating some prior information about their distributions and (2) combining the acoustic contributions of concurrent alternate word hypotheses. Starting from a baseline system where multiple pronunciations are treated as word copies without priors, an extension of the usual Viterbi decoding is presented which integrates unigram priors in a weighted sum of acoustic probabilities. Several approximations are discussed leading to new decoding aspects. Experimental results are presented for US broadcast news recordings. It is shown that the use of unigram priors has a clear positive impact on both error rate and decoding cost while the sum over multiple pronunciation contributions brings another small improvement. An overall 4% reduction of the error rate is achieved on the HUB-4 evaluation sets of 97 and 98.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131686991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861800
J. Rottland, G. Rigoll
This paper presents a method to improve the recognition rate of hybrid connectionist/HMM speech recognition systems. At the same time this approach allows the easy introduction of context dependent models in the hybrid framework. The approach is based on a standard hybrid connectionist/HMM recognizer, in which the neural nets are trained to estimate the a posteriori probabilities for all phones in each input frame. In the approach presented here, the probabilities of the neural nets are used to replace the codebook of a tied-mixture HMM system. Therefore the resulting system is called tied posterior. The advantages of this structure are that an arbitrary HMM-topology can be used, and that all context dependency and all clustering techniques used in tied-mixture systems can be applied to this hybrid speech recognition system. The approach has been evaluated on the Wall Street Journal (WSJ) database, with the result, that it outperforms the standard hybrid approach on this task.
{"title":"Tied posteriors: an approach for effective introduction of context dependency in hybrid NN/HMM LVCSR","authors":"J. Rottland, G. Rigoll","doi":"10.1109/ICASSP.2000.861800","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861800","url":null,"abstract":"This paper presents a method to improve the recognition rate of hybrid connectionist/HMM speech recognition systems. At the same time this approach allows the easy introduction of context dependent models in the hybrid framework. The approach is based on a standard hybrid connectionist/HMM recognizer, in which the neural nets are trained to estimate the a posteriori probabilities for all phones in each input frame. In the approach presented here, the probabilities of the neural nets are used to replace the codebook of a tied-mixture HMM system. Therefore the resulting system is called tied posterior. The advantages of this structure are that an arbitrary HMM-topology can be used, and that all context dependency and all clustering techniques used in tied-mixture systems can be applied to this hybrid speech recognition system. The approach has been evaluated on the Wall Street Journal (WSJ) database, with the result, that it outperforms the standard hybrid approach on this task.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131711363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861810
C. Fügen, I. Rogina
Context decision trees are widely used in the speech recognition community. Besides questions about phonetic classes of a phone's context, questions about their position within a word and questions about the gender of the current speaker have been used so far. In this paper we additionally incorporate questions about current modalities of the spoken utterance like the speaker's dialect, the speaking rate, the signal to noise ratio, the latter two of which may change while speaking one utterance. We present a framework that treats all these modalities in a uniform way. Experiments with the Janus speech recognizer have produced error rate reductions of up to 10% when compared to systems that do not use modality questions.
{"title":"Integrating dynamic speech modalities into context decision trees","authors":"C. Fügen, I. Rogina","doi":"10.1109/ICASSP.2000.861810","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861810","url":null,"abstract":"Context decision trees are widely used in the speech recognition community. Besides questions about phonetic classes of a phone's context, questions about their position within a word and questions about the gender of the current speaker have been used so far. In this paper we additionally incorporate questions about current modalities of the spoken utterance like the speaker's dialect, the speaking rate, the signal to noise ratio, the latter two of which may change while speaking one utterance. We present a framework that treats all these modalities in a uniform way. Experiments with the Janus speech recognizer have produced error rate reductions of up to 10% when compared to systems that do not use modality questions.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132600651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.860972
J. Riba, G. Vázquez
The use of (spectrally efficient) CPM modulations may lead to a serious performance degradation of the classical non-data-aided (NDA) frequency and timing estimators due to the presence of self noise. The actual performance of these estimators is usually much worse than that predicted by the classical modified Cramer-Rao bound. We apply some well known results in the field of signal processing to these two important problems of synchronization. In particular we propose and explain the meaning of the unconditional CRB in the synchronization task. Simulation results for MSK and GMSK, along with the performance of some classical and previously proposed synchronizers, show that the proposed bound (along with the MCRB) is useful for a better prediction of the ultimate performance of the NDA estimators.
{"title":"Non-data-aided frequency offset and symbol timing estimation for binary CPM: performance bounds","authors":"J. Riba, G. Vázquez","doi":"10.1109/ICASSP.2000.860972","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860972","url":null,"abstract":"The use of (spectrally efficient) CPM modulations may lead to a serious performance degradation of the classical non-data-aided (NDA) frequency and timing estimators due to the presence of self noise. The actual performance of these estimators is usually much worse than that predicted by the classical modified Cramer-Rao bound. We apply some well known results in the field of signal processing to these two important problems of synchronization. In particular we propose and explain the meaning of the unconditional CRB in the synchronization task. Simulation results for MSK and GMSK, along with the performance of some classical and previously proposed synchronizers, show that the proposed bound (along with the MCRB) is useful for a better prediction of the ultimate performance of the NDA estimators.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132635449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.860943
Wing-Kin Ma, P. Ching, K. M. Wong
Investigation of detection schemes for non-orthogonal multicarrier modulation (MCM) is motivated by two reasons. Firstly, non-orthogonal MCM offers a higher degree of freedom in pulse-shaping design. Secondly, the problem of detecting orthogonal MCM under channel distortion can be viewed as a problem of detecting non-orthogonal MCM. In this work, the maximum likelihood detector (MLD) is considered for non-orthogonal multicarrier systems. In the absence of inter-block interference, it is shown that the MLD can be efficiently achieved by a Viterbi algorithm (VA). In contrast to using the VA for channel equalization, the proposed VA has its survivor metrics running in the""frequency domain". Incorporating this VA with an interference-canceling approach, we also develop a decision feedback MLD for the case of non-zero inter-block interference. Superior bit error performance of the MLDs is demonstrated by simulations.
{"title":"Maximum likelihood detection for multicarrier systems employing non-orthogonal pulse shapes","authors":"Wing-Kin Ma, P. Ching, K. M. Wong","doi":"10.1109/ICASSP.2000.860943","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860943","url":null,"abstract":"Investigation of detection schemes for non-orthogonal multicarrier modulation (MCM) is motivated by two reasons. Firstly, non-orthogonal MCM offers a higher degree of freedom in pulse-shaping design. Secondly, the problem of detecting orthogonal MCM under channel distortion can be viewed as a problem of detecting non-orthogonal MCM. In this work, the maximum likelihood detector (MLD) is considered for non-orthogonal multicarrier systems. In the absence of inter-block interference, it is shown that the MLD can be efficiently achieved by a Viterbi algorithm (VA). In contrast to using the VA for channel equalization, the proposed VA has its survivor metrics running in the\"\"frequency domain\". Incorporating this VA with an interference-canceling approach, we also develop a decision feedback MLD for the case of non-zero inter-block interference. Superior bit error performance of the MLDs is demonstrated by simulations.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129377296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}