Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781485
C. Ribeiro, I. Trancoso
The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding a speaker adaptation methodology is adopted, although resulting in higher bit rates to transmit the speaker specific information. In order to further reduce the corresponding bit rate, a new method is proposed that explores the intra-speaker correlation for the same phone.
{"title":"Speaker adaptation in a phonetic vocoding environment","authors":"C. Ribeiro, I. Trancoso","doi":"10.1109/SCFT.1999.781485","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781485","url":null,"abstract":"The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding a speaker adaptation methodology is adopted, although resulting in higher bit rates to transmit the speaker specific information. In order to further reduce the corresponding bit rate, a new method is proposed that explores the intra-speaker correlation for the same phone.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125165146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781479
Y. Shoham
This paper presents a method for designing and optimizing predictive vector quantizers (PVQ) for coding the line spectral frequencies (LSF) in LPC-based speech and audio coders. The algorithm is based on iterative optimization of the predictors and the vector-quantizer codebooks. It is shown that the proposed method yields high quality LSF predictive quantizers with performance exceeding that of the PVQ used in the G.729 standard.
{"title":"Coding the line spectral frequencies by jointly optimized MA prediction and vector quantization","authors":"Y. Shoham","doi":"10.1109/SCFT.1999.781479","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781479","url":null,"abstract":"This paper presents a method for designing and optimizing predictive vector quantizers (PVQ) for coding the line spectral frequencies (LSF) in LPC-based speech and audio coders. The algorithm is based on iterative optimization of the predictors and the vector-quantizer codebooks. It is shown that the proposed method yields high quality LSF predictive quantizers with performance exceeding that of the PVQ used in the G.729 standard.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116835661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781499
S. Andersen, W. Kleijn
Reverse water-filling suggests that, at low bit rates, the synthesis filter for predictive encoding should differ from the model filter of the signal to be encoded. However, reverse water-filling follows from optimum encoding and stationary Gaussian assumptions. By means of simple experiments, we show that reverse water-filling applies to predictive encoding of speech. For a vector analysis-by-synthesis encoding based on a first order autoregressive signal model, the use of a synthesis filter derived from reverse water-filling resulted in consistently improved segmental SNR measures.
{"title":"Reverse water-filling in predictive encoding of speech","authors":"S. Andersen, W. Kleijn","doi":"10.1109/SCFT.1999.781499","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781499","url":null,"abstract":"Reverse water-filling suggests that, at low bit rates, the synthesis filter for predictive encoding should differ from the model filter of the signal to be encoded. However, reverse water-filling follows from optimum encoding and stationary Gaussian assumptions. By means of simple experiments, we show that reverse water-filling applies to predictive encoding of speech. For a vector analysis-by-synthesis encoding based on a first order autoregressive signal model, the use of a synthesis filter derived from reverse water-filling resulted in consistently improved segmental SNR measures.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781496
N. R. Chong, I. Burnett, J. Chicharo
Adaptation of the waveform interpolation (WI) paradigm to allow waveform coding of the speech signals was reported by Kleijn et al. (see Proc. 5/sup th/ Int. Conf. Spoken Language Processing, Sydney, Australia, Dec. 1998). However, since the signal is time-warped to a constant pitch, processing the surface derived from the new technique is extremely dependent on having an accurate pitch track. In order to facilitate vector quantisation techniques, it is necessary to manipulate the pitch track to ensure phase-alignment of critically sampled pitch periods. In addition, pitch cycles following unvoiced segments must also carry the same phase offset. The adjusted pitch track is used to facilitate a re-warping of the residual signal. The effects of warping and pitch inaccuracies on the transformed result of the warped periods are also discussed.
{"title":"Adapting waveform interpolation (with pitch-spaced subbands) for quantisation","authors":"N. R. Chong, I. Burnett, J. Chicharo","doi":"10.1109/SCFT.1999.781496","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781496","url":null,"abstract":"Adaptation of the waveform interpolation (WI) paradigm to allow waveform coding of the speech signals was reported by Kleijn et al. (see Proc. 5/sup th/ Int. Conf. Spoken Language Processing, Sydney, Australia, Dec. 1998). However, since the signal is time-warped to a constant pitch, processing the surface derived from the new technique is extremely dependent on having an accurate pitch track. In order to facilitate vector quantisation techniques, it is necessary to manipulate the pitch track to ensure phase-alignment of critically sampled pitch periods. In addition, pitch cycles following unvoiced segments must also carry the same phase offset. The adjusted pitch track is used to facilitate a re-warping of the residual signal. The effects of warping and pitch inaccuracies on the transformed result of the warped periods are also discussed.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129397668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781488
S. Heinen, S. Bleck, P. Vary
In medium to low bit rate speech codecs the speech signal is represented by a set of parameters. The most important concept is presently code excited linear predictive (CELP) coding. A speech segment of typically 10 to 20 ms is described in terms of prediction coefficients, gain factors and excitation vectors. Due to the high compression rates (0.5-1.5 bits per speech sample) the parameters are partly highly sensitive against channel noise. In this paper we present a new error protection technique, that is based on a joint optimization of parameter quantization and a redundant non-linear block coding scheme. For parameter reconstruction, the principle of soft bit source decoding is applied. The proposed technique can be used in combination with conventional error protection such as convolutional coding and allows a flexible subdivision of the gross data rate for source coding and error protection.
{"title":"Robust speech transmission over noisy channels employing non-linear block codes","authors":"S. Heinen, S. Bleck, P. Vary","doi":"10.1109/SCFT.1999.781488","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781488","url":null,"abstract":"In medium to low bit rate speech codecs the speech signal is represented by a set of parameters. The most important concept is presently code excited linear predictive (CELP) coding. A speech segment of typically 10 to 20 ms is described in terms of prediction coefficients, gain factors and excitation vectors. Due to the high compression rates (0.5-1.5 bits per speech sample) the parameters are partly highly sensitive against channel noise. In this paper we present a new error protection technique, that is based on a joint optimization of parameter quantization and a redundant non-linear block coding scheme. For parameter reconstruction, the principle of soft bit source decoding is applied. The proposed technique can be used in combination with conventional error protection such as convolutional coding and allows a flexible subdivision of the gross data rate for source coding and error protection.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122045932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781501
K. Koppinen, T. Mikkonen
A new method for fast decoding when using algebraic codes for the fixed codebook in CELP speech coders is presented. This method is based on the trellis structure of a block code, and allows fast optimal search of the residual codebook even with a combined scalar gain, unlike previous search methods. The method is flexible, allowing for long block lengths and the use of any code including nonlinear ones. Currently the performance is not as high as with standard algebraic coding methods, but further refinements may make this a viable method.
{"title":"Fast spherical code decoding algorithms for the residual codebook in CELP coders","authors":"K. Koppinen, T. Mikkonen","doi":"10.1109/SCFT.1999.781501","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781501","url":null,"abstract":"A new method for fast decoding when using algebraic codes for the fixed codebook in CELP speech coders is presented. This method is based on the trellis structure of a block code, and allows fast optimal search of the residual codebook even with a combined scalar gain, unlike previous search methods. The method is flexible, allowing for long block lengths and the use of any code including nonlinear ones. Currently the performance is not as high as with standard algebraic coding methods, but further refinements may make this a viable method.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129471156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781516
A. Mustapha, S. Yeldener
This paper presents an adaptive time-domain post-filtering technique based on the least squares approach and modified Yule-Walker (MYW) filter. Conventionally, post-filtering is derived from an original LPC spectrum. In general, this time-domain technique produces unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse and high pass filtering and hence introduces muffling in the speech quality. Other approaches of designing post-filters were developed in the frequency domain which can only be used in sinusoidal based speech coders. We have also developed a new time-domain post-filtering technique which eliminates the problem of spectral tilt in the speech spectrum and can be applied to various speech coders. The new post-filter has a flat frequency response at the formant peaks of the speech spectrum. This post-filtering technique has been used in a 4 kb/s harmonic excitation linear predictive coder (HE-LPC) and subjective listening tests have indicated that this technique outperforms the conventional one in both one and two tandem connections.
{"title":"An adaptive post-filtering technique based on a least squares approach","authors":"A. Mustapha, S. Yeldener","doi":"10.1109/SCFT.1999.781516","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781516","url":null,"abstract":"This paper presents an adaptive time-domain post-filtering technique based on the least squares approach and modified Yule-Walker (MYW) filter. Conventionally, post-filtering is derived from an original LPC spectrum. In general, this time-domain technique produces unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse and high pass filtering and hence introduces muffling in the speech quality. Other approaches of designing post-filters were developed in the frequency domain which can only be used in sinusoidal based speech coders. We have also developed a new time-domain post-filtering technique which eliminates the problem of spectral tilt in the speech spectrum and can be applied to various speech coders. The new post-filter has a flat frequency response at the formant peaks of the speech spectrum. This post-filtering technique has been used in a 4 kb/s harmonic excitation linear predictive coder (HE-LPC) and subjective listening tests have indicated that this technique outperforms the conventional one in both one and two tandem connections.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121421104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781517
H. Tasaki, S. Takahashi
A new post-process called a post noise smoother (PNS) for the CELP decoder is proposed in order to improve low bit rate speech-coding performance under various background noise conditions. In the PNS, spectral amplitude smoothing and phase randomizing are performed on the decoded speech in order to obtain smoothed background noise. The decoded speech, the smoothed signal, and an automatically generated imitative noise signal are multiplied by adaptive gains and are summed up in the final output speech. These gains are computed from each frame's estimated ratio of background noise to signal. Evaluation test results show that the PNS significantly improves the subjective quality of a 4-kbps speech coder under various conditions of background noise.
{"title":"Post noise smoother to improve low bit rate speech-coding performance","authors":"H. Tasaki, S. Takahashi","doi":"10.1109/SCFT.1999.781517","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781517","url":null,"abstract":"A new post-process called a post noise smoother (PNS) for the CELP decoder is proposed in order to improve low bit rate speech-coding performance under various background noise conditions. In the PNS, spectral amplitude smoothing and phase randomizing are performed on the decoded speech in order to obtain smoothed background noise. The decoded speech, the smoothed signal, and an automatically generated imitative noise signal are multiplied by adaptive gains and are summed up in the final output speech. These gains are computed from each frame's estimated ratio of background noise to signal. Evaluation test results show that the PNS significantly improves the subjective quality of a 4-kbps speech coder under various conditions of background noise.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114894902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781467
S. Ramprashad
Speech and audio coding are often considered to be two separate technologies, each almost independently developing different techniques for signal compression. At low bit rates the gap in performance between the two technologies begins to be noticeable; speech coders work better on speech and audio coders perform better on music. The challenge is to merge the two technologies into a single coding paradigm which will work as well as either two regardless of the input signal. Presented is a multimode speech and audio coder which can adapt almost continuously between a speech and audio coding mode. This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictive coder (TPC).
{"title":"A multimode transform predictive coder (MTPC) for speech and audio","authors":"S. Ramprashad","doi":"10.1109/SCFT.1999.781467","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781467","url":null,"abstract":"Speech and audio coding are often considered to be two separate technologies, each almost independently developing different techniques for signal compression. At low bit rates the gap in performance between the two technologies begins to be noticeable; speech coders work better on speech and audio coders perform better on music. The challenge is to merge the two technologies into a single coding paradigm which will work as well as either two regardless of the input signal. Presented is a multimode speech and audio coder which can adapt almost continuously between a speech and audio coding mode. This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictive coder (TPC).","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122715632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781511
Doh-Suk Kim, O. Ghitza, P. Kroon
A computational model to predict MOS (mean opinion score) of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes our attempt to derive a "universal", database-independent, distortion-to-MOS mapping function. Preliminary experimental evaluation shows that the performance of the proposed system is comparable with ITU-T recommendation P.861 for clean speech sources, and outperforms the P.861 recommendation for speech sources corrupted by either car or babble noise at 30 dB SNR.
{"title":"A computational model for MOS prediction","authors":"Doh-Suk Kim, O. Ghitza, P. Kroon","doi":"10.1109/SCFT.1999.781511","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781511","url":null,"abstract":"A computational model to predict MOS (mean opinion score) of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes our attempt to derive a \"universal\", database-independent, distortion-to-MOS mapping function. Preliminary experimental evaluation shows that the performance of the proposed system is comparable with ITU-T recommendation P.861 for clean speech sources, and outperforms the P.861 recommendation for speech sources corrupted by either car or babble noise at 30 dB SNR.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116560635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}