Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861999
P. Kidmose
A new stochastic gradient robust filtering method, based on a non-linear amplitude transformation, is proposed. The method requires no a priori knowledge of the characteristics of the input signals and it is insensitive to the signals distribution and to the stationarity of the signals. A simulation study, applying both synthetic and real-world signals, shows that the proposed method has overall better robustness performance, in terms of modeling error, compared with state-of-the-art robust filtering methods. A remarkable property of the proposed method is that it can handle double-talk in the acoustical echo-cancellation problem.
{"title":"Adaptive filtering for non-Gaussian processes","authors":"P. Kidmose","doi":"10.1109/ICASSP.2000.861999","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861999","url":null,"abstract":"A new stochastic gradient robust filtering method, based on a non-linear amplitude transformation, is proposed. The method requires no a priori knowledge of the characteristics of the input signals and it is insensitive to the signals distribution and to the stationarity of the signals. A simulation study, applying both synthetic and real-world signals, shows that the proposed method has overall better robustness performance, in terms of modeling error, compared with state-of-the-art robust filtering methods. A remarkable property of the proposed method is that it can handle double-talk in the acoustical echo-cancellation problem.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117197534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861948
Stephan Kanthak, Kai Schütz, H. Ney
Most modern processor architectures provide SIMD (single instruction multiple data) instructions to speed up algorithms based on vector or matrix operations. This paper describes the use of SIMD instructions to calculate Gaussian or Laplacian densities in a large vocabulary speech recognition system. We present a simple, robust method based on scalar quantization of the mean and observation vector components without any loss in recognition performance while speeding up the whole system's runtime by a factor of 3. Combining the approach with vector space partitioning techniques accelerated the overall system by a factor of over 7. The experiments show that the approach can be also applied to Viterbi training without any loss of accuracy. All experiments were conducted on a German, 10,000-word, spontaneous speech task using two architectures, namely Intel Pentium III and SUN UltraSPARC.
{"title":"Using SIMD instructions for fast likelihood calculation in LVCSR","authors":"Stephan Kanthak, Kai Schütz, H. Ney","doi":"10.1109/ICASSP.2000.861948","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861948","url":null,"abstract":"Most modern processor architectures provide SIMD (single instruction multiple data) instructions to speed up algorithms based on vector or matrix operations. This paper describes the use of SIMD instructions to calculate Gaussian or Laplacian densities in a large vocabulary speech recognition system. We present a simple, robust method based on scalar quantization of the mean and observation vector components without any loss in recognition performance while speeding up the whole system's runtime by a factor of 3. Combining the approach with vector space partitioning techniques accelerated the overall system by a factor of over 7. The experiments show that the approach can be also applied to Viterbi training without any loss of accuracy. All experiments were conducted on a German, 10,000-word, spontaneous speech task using two architectures, namely Intel Pentium III and SUN UltraSPARC.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121356178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861858
T. Gustafsson, U. Lindgren, H. Sahlin
This paper explores an existing method for signal separation, which is based on second order statistics. Here, statistical analysis of a generalized version of the original algorithm is given. The generalized method includes a weighting matrix, and a result of the statistical analysis is that the best possible weighting is found. The problem of initialization of the involved non-linear optimization is also discussed.
{"title":"Statistical analysis of a signal separation method based on second order statistics","authors":"T. Gustafsson, U. Lindgren, H. Sahlin","doi":"10.1109/ICASSP.2000.861858","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861858","url":null,"abstract":"This paper explores an existing method for signal separation, which is based on second order statistics. Here, statistical analysis of a generalized version of the original algorithm is given. The generalized method includes a weighting matrix, and a result of the statistical analysis is that the best possible weighting is found. The problem of initialization of the involved non-linear optimization is also discussed.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127268869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.859180
C. Fredouille, J. Mariéthoz, C. Jaboulet, J. Hennebert, C. Mokbel, F. Bimbot
Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform-the state-of-the-art speaker verification system developed in the PICASSO project.
{"title":"Behavior of a Bayesian adaptation method for incremental enrollment in speaker verification","authors":"C. Fredouille, J. Mariéthoz, C. Jaboulet, J. Hennebert, C. Mokbel, F. Bimbot","doi":"10.1109/ICASSP.2000.859180","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859180","url":null,"abstract":"Classical adaptation approaches are generally used for speaker or environment adaptation of speech recognition systems. In this paper, we use such techniques for the incremental training of client models in a speaker verification system. The initial model is trained on a very limited amount of data and then progressively updated with access data, using a segmental-EM procedure. In supervised mode (i.e. when access utterances are certified), the incremental approach yields equivalent performance to the batch one. We also investigate on the impact of various scenarios of impostor attacks during the incremental enrollment phase. All results are obtained with the Picassoft platform-the state-of-the-art speaker verification system developed in the PICASSO project.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127269935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.860941
D. Slock, I. Ghauri
We address the problem of downlink interference rejection in a DS-CDMA system. Periodic orthogonal Walsh-Hadamard sequences spread different users' symbols followed by scrambling by a symbol aperiodic base-station specific overlay sequence. The point-to-point propagation channel from the cell-site to a certain mobile station is the same for all downlink signals (desired user as well as the intracell interference). Orthogonality of the underlying Walsh-Hadamard sequences is destroyed by multipath propagation, resulting in multiuser interference if a coherent combiner (the RAKE receiver) is employed. In this paper, we propose a blind linear equalization algorithm which equalizes for the common downlink channel, thus rendering the user signals orthogonal again. A simple code matched filter subsequently suffices to cancel the multiple access interference (MAI) from intracell users. It is shown that the receiver maximizes the signal-to-interference plus noise ratio (SINR) at its output.
{"title":"Blind maximum SINR receiver for the DS-CDMA downlink","authors":"D. Slock, I. Ghauri","doi":"10.1109/ICASSP.2000.860941","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860941","url":null,"abstract":"We address the problem of downlink interference rejection in a DS-CDMA system. Periodic orthogonal Walsh-Hadamard sequences spread different users' symbols followed by scrambling by a symbol aperiodic base-station specific overlay sequence. The point-to-point propagation channel from the cell-site to a certain mobile station is the same for all downlink signals (desired user as well as the intracell interference). Orthogonality of the underlying Walsh-Hadamard sequences is destroyed by multipath propagation, resulting in multiuser interference if a coherent combiner (the RAKE receiver) is employed. In this paper, we propose a blind linear equalization algorithm which equalizes for the common downlink channel, thus rendering the user signals orthogonal again. A simple code matched filter subsequently suffices to cancel the multiple access interference (MAI) from intracell users. It is shown that the receiver maximizes the signal-to-interference plus noise ratio (SINR) at its output.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124729868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.861915
M. Colas, G. Gelle, G. Delaunay
This paper presents some new results concerning the bispectrum of sampled signals. We show that sampling a stationary signal at Fs=2B usually implies a non-zero outer triangle (OT) domain in the bispectrum due to overlapping. Moreover, we pointed out that processes (stationary or not) sampled at Fs>3B are always zero in the OT (no overlapping). Finally, we propose an empirical method for which a non-zero OT indicates that the signal is really non-stationary and we propose to combine this approach with the Hinich stationarity test (Hinich 1990, and Hinich and Messer 1995).
{"title":"Some new results on the discrete bispectrum","authors":"M. Colas, G. Gelle, G. Delaunay","doi":"10.1109/ICASSP.2000.861915","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.861915","url":null,"abstract":"This paper presents some new results concerning the bispectrum of sampled signals. We show that sampling a stationary signal at Fs=2B usually implies a non-zero outer triangle (OT) domain in the bispectrum due to overlapping. Moreover, we pointed out that processes (stationary or not) sampled at Fs>3B are always zero in the OT (no overlapping). Finally, we propose an empirical method for which a non-zero OT indicates that the signal is really non-stationary and we propose to combine this approach with the Hinich stationarity test (Hinich 1990, and Hinich and Messer 1995).","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124815481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.860136
G. Orsak, S. Douglas
We propose a blind signal separation algorithm (CLUE) that uses the sum of the individual code lengths of the extracted signals as a measure of the separation performance. The new technique combines a widely-available universal data compression routine with any single-parameter search procedure. Unlike previous approaches, the proposed method is model-free and does not rely on the moment values of the signals for its separation performance. An example shows the algorithm's efficiency in separating mixtures of image, audio, and text data.
{"title":"Code-length-based universal extraction for blind signal separation","authors":"G. Orsak, S. Douglas","doi":"10.1109/ICASSP.2000.860136","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860136","url":null,"abstract":"We propose a blind signal separation algorithm (CLUE) that uses the sum of the individual code lengths of the extracted signals as a measure of the separation performance. The new technique combines a widely-available universal data compression routine with any single-parameter search procedure. Unlike previous approaches, the proposed method is model-free and does not rely on the moment values of the signals for its separation performance. An example shows the algorithm's efficiency in separating mixtures of image, audio, and text data.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124866263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.859116
J. Wouters, Michael W. Macon
Concatenative synthesis can produce high-quality speech but is limited to the allophonic variations and voice types that were captured in the database. It would be desirable to modify speech units to remove formant discontinuities and to create new speaking styles, such as hypo- or hyper-articulated speech. Unfortunately, manipulating the spectral structure often leads to degraded speech quality. We investigate two speech modification strategies, one based on inverse filtering and the other on sinusoidal modeling, and we explain their merits and shortcomings for changing the spectral envelope in speech. We then propose a method which uses sinusoidal modeling and represents the complex sinusoidal amplitudes by an all-pole model. The all-pole model approximates the sinusoidal spectrum well, both in the amplitude and in the phase domain. We use the sinusoidal+all-pole model to control the spectral envelope in recorded speech. High-quality modified speech is generated from the model using sinusoidal synthesis. A perceptual test was conducted, which shows that the model was effective at changing vowel identities and was preferable over residual excited LPC.
{"title":"Spectral modification for concatenative speech synthesis","authors":"J. Wouters, Michael W. Macon","doi":"10.1109/ICASSP.2000.859116","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859116","url":null,"abstract":"Concatenative synthesis can produce high-quality speech but is limited to the allophonic variations and voice types that were captured in the database. It would be desirable to modify speech units to remove formant discontinuities and to create new speaking styles, such as hypo- or hyper-articulated speech. Unfortunately, manipulating the spectral structure often leads to degraded speech quality. We investigate two speech modification strategies, one based on inverse filtering and the other on sinusoidal modeling, and we explain their merits and shortcomings for changing the spectral envelope in speech. We then propose a method which uses sinusoidal modeling and represents the complex sinusoidal amplitudes by an all-pole model. The all-pole model approximates the sinusoidal spectrum well, both in the amplitude and in the phase domain. We use the sinusoidal+all-pole model to control the spectral envelope in recorded speech. High-quality modified speech is generated from the model using sinusoidal synthesis. A perceptual test was conducted, which shows that the model was effective at changing vowel identities and was preferable over residual excited LPC.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124911100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.860094
A. Erdogan, T. Arslan
The paper presents a new algorithm for low power implementation of digital filters. The algorithm reduces power consumption through a two phased strategy, which targets reducing the switched capacitance within the multiplier circuit. The first phase involves the segmentation of coefficients into more primitive components which could in turn be processed through a single shift and a more primitive multiplication operations. The second phase exploits the correlation among the new set of coefficients at the coefficient input of the multiplier for more reduction in switched capacitance. The paper describes the algorithm and its evaluation environment and provides results with a number of filter examples demonstrating up to 65% reduction in power compared to conventional filtering.
{"title":"An order based segmentation algorithm for low power implementation of digital filters","authors":"A. Erdogan, T. Arslan","doi":"10.1109/ICASSP.2000.860094","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.860094","url":null,"abstract":"The paper presents a new algorithm for low power implementation of digital filters. The algorithm reduces power consumption through a two phased strategy, which targets reducing the switched capacitance within the multiplier circuit. The first phase involves the segmentation of coefficients into more primitive components which could in turn be processed through a single shift and a more primitive multiplication operations. The second phase exploits the correlation among the new set of coefficients at the coefficient input of the multiplier for more reduction in switched capacitance. The paper describes the algorithm and its evaluation environment and provides results with a number of filter examples demonstrating up to 65% reduction in power compared to conventional filtering.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125051015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2000-06-05DOI: 10.1109/ICASSP.2000.859195
T. Haddad, A. Yongaçoğlu
This paper introduces a new codebook search algorithm for trellis vector quantization systems (TVQ). The development of the new algorithm is based on the symbol-MAP channel decoding algorithm, which is modified for data compression to deliver soft distortion-related reliability information. Following a rate-distortion theoretic approach, the soft information is used to derive a codebook search algorithm that is capable of solving the problems associated with the LBG algorithm. The derived algorithm is fuzzy in the sense that it follows a soft association rule, however, it is deterministic in the descent towards the global minimum distortion point. Although the derivation is general, the algorithm is tested using gray-scale images, which provide a nonconvex square-error distortion surface. As shown in the simulation section, the new algorithm provides lower energy codebooks (/spl sim/0.8 dB gain), while being significantly less sensitive to initial codebooks using short training image sequences.
{"title":"Fuzzy trellis vector quantization of images","authors":"T. Haddad, A. Yongaçoğlu","doi":"10.1109/ICASSP.2000.859195","DOIUrl":"https://doi.org/10.1109/ICASSP.2000.859195","url":null,"abstract":"This paper introduces a new codebook search algorithm for trellis vector quantization systems (TVQ). The development of the new algorithm is based on the symbol-MAP channel decoding algorithm, which is modified for data compression to deliver soft distortion-related reliability information. Following a rate-distortion theoretic approach, the soft information is used to derive a codebook search algorithm that is capable of solving the problems associated with the LBG algorithm. The derived algorithm is fuzzy in the sense that it follows a soft association rule, however, it is deterministic in the descent towards the global minimum distortion point. Although the derivation is general, the algorithm is tested using gray-scale images, which provide a nonconvex square-error distortion surface. As shown in the simulation section, the new algorithm provides lower energy codebooks (/spl sim/0.8 dB gain), while being significantly less sensitive to initial codebooks using short training image sequences.","PeriodicalId":164817,"journal":{"name":"2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125071991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}