Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781482
A. Vahatalo, I. Johansson
This paper describes the VAD (voice activity detection) for controlling DTX (discontinuous transmission) of the GSM AMR (adaptive multi-rate) speech codec. The algorithm is based on spectral estimation and periodicity detection. The VAD contains a 9-band IIR filter bank, which divides input signals into frequency bands. The signal level at each band is calculated. Background noise is estimated in each sub-band. The VAD decision is computed by comparing input signal level and background noise estimate. The algorithm incorporates novel methods to estimate background noise and to detect periodic components based on open-loop pitch gain. A new method is also derived to detect correlated complex signals like music.
{"title":"Voice activity detection for GSM adaptive multi-rate codec","authors":"A. Vahatalo, I. Johansson","doi":"10.1109/SCFT.1999.781482","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781482","url":null,"abstract":"This paper describes the VAD (voice activity detection) for controlling DTX (discontinuous transmission) of the GSM AMR (adaptive multi-rate) speech codec. The algorithm is based on spectral estimation and periodicity detection. The VAD contains a 9-band IIR filter bank, which divides input signals into frequency bands. The signal level at each band is calculated. Background noise is estimated in each sub-band. The VAD decision is computed by comparing input signal level and background noise estimate. The algorithm incorporates novel methods to estimate background noise and to detect periodic components based on open-loop pitch gain. A new method is also derived to detect correlated complex signals like music.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"23 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129023294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781509
K. El-Maleh, P. Kabal
In this paper, we present a novel background noise coding scheme for variable rate speech coders. Existing approaches to noise coding at very low bit rates (i.e. below 1 kbps) fail to faithfully reproduce background noise resulting in a degradation of the overall perceptual quality. In our approach, classification of the noise type is used to select the type of excitation to be used at the receiver. To illustrate the benefits of our scheme, we have modified the noise coding mode of the CDMA enhanced variable rate codec (EVRC) to include the proposed class-dependent noise excitation model. Evaluation tests have shown that we have improved the overall quality with the proposed noise coding scheme without an increase in bit rate.
{"title":"An improved background noise coding mode for variable rate speech coders","authors":"K. El-Maleh, P. Kabal","doi":"10.1109/SCFT.1999.781509","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781509","url":null,"abstract":"In this paper, we present a novel background noise coding scheme for variable rate speech coders. Existing approaches to noise coding at very low bit rates (i.e. below 1 kbps) fail to faithfully reproduce background noise resulting in a degradation of the overall perceptual quality. In our approach, classification of the noise type is used to select the type of excitation to be used at the receiver. To illustrate the benefits of our scheme, we have modified the noise coding mode of the CDMA enhanced variable rate codec (EVRC) to include the proposed class-dependent noise excitation model. Evaluation tests have shown that we have improved the overall quality with the proposed noise coding scheme without an increase in bit rate.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132326448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781515
J. Jensen, S. H. Jensen, E. Hansen
A novel pre-processing algorithm for CELP-coders is proposed. The algorithm aims at perturbing the original signal slightly, such that the perturbed signal is subjectively indistinguishable from the original but can be coded more effectively. A key feature of the algorithm is the possibility of controlling the frequency domain properties of the perturbations. Preliminary simulations with the proposed algorithm in combination with a CELP-like coder indicate improvements in terms of segmental SNR and subjective speech quality.
{"title":"A perturbation-based pre-processing algorithm for CELP-coders","authors":"J. Jensen, S. H. Jensen, E. Hansen","doi":"10.1109/SCFT.1999.781515","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781515","url":null,"abstract":"A novel pre-processing algorithm for CELP-coders is proposed. The algorithm aims at perturbing the original signal slightly, such that the perturbed signal is subjectively indistinguishable from the original but can be coded more effectively. A key feature of the algorithm is the possibility of controlling the frequency domain properties of the perturbations. Preliminary simulations with the proposed algorithm in combination with a CELP-like coder indicate improvements in terms of segmental SNR and subjective speech quality.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132414237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781498
M. Tammi, V.T. Ruoppila, S. Kuusisto, J. Saarinen
Several speech coding algorithms modify the time scale of the residual signal to facilitate efficient coding of pitch information. Time scaling, however, results in a phase difference between the coded residual signal and the time-variant linear prediction (LP) filter used for synthesis in the decoder. In this paper, we examine the coding distortion induced by this phase difference. Moreover, we show that it may cause audible artifacts to the synthesized speech even if lossless coding of all parameters is employed. These artifacts occur particularly at onsets when the frequency response of successive LP filters changes rapidly. A waveform interpolation coder is used to illustrate the effects of the phase mismatch.
{"title":"Coding distortion caused by a phase difference between the LP filter and its residual","authors":"M. Tammi, V.T. Ruoppila, S. Kuusisto, J. Saarinen","doi":"10.1109/SCFT.1999.781498","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781498","url":null,"abstract":"Several speech coding algorithms modify the time scale of the residual signal to facilitate efficient coding of pitch information. Time scaling, however, results in a phase difference between the coded residual signal and the time-variant linear prediction (LP) filter used for synthesis in the decoder. In this paper, we examine the coding distortion induced by this phase difference. Moreover, we show that it may cause audible artifacts to the synthesized speech even if lossless coding of all parameters is employed. These artifacts occur particularly at onsets when the frequency response of successive LP filters changes rapidly. A waveform interpolation coder is used to illustrate the effects of the phase mismatch.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781473
P. Hedelin, F. Nordén, J. Skoglund
In spectral coding of speech, several different criteria are in use for designing and evaluating quantizers. One measure, spectral distortion (SD), has become dominant for comparisons between coders. At run-time, a coder normally quantizes vectors according to other measures, e.g. line spectrum frequency (LSF) distance, in order to keep computational complexity down. In this study, we adopt the SD criterion both in coder design and for quantizer operation. The quantizer is optimized to give minimal average SD scores, This allows us to address the question, is average SD measure really a good criterion, matching subjective ratings. We perform a few objective and subjective tests based on SD optimized coding and some versions thereof. Our tests imply that minimizing average SD may not lead to the best subjective scoring.
{"title":"SD optimization of spectral coders","authors":"P. Hedelin, F. Nordén, J. Skoglund","doi":"10.1109/SCFT.1999.781473","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781473","url":null,"abstract":"In spectral coding of speech, several different criteria are in use for designing and evaluating quantizers. One measure, spectral distortion (SD), has become dominant for comparisons between coders. At run-time, a coder normally quantizes vectors according to other measures, e.g. line spectrum frequency (LSF) distance, in order to keep computational complexity down. In this study, we adopt the SD criterion both in coder design and for quantizer operation. The quantizer is optimized to give minimal average SD scores, This allows us to address the question, is average SD measure really a good criterion, matching subjective ratings. We perform a few objective and subjective tests based on SD optimized coding and some versions thereof. Our tests imply that minimizing average SD may not lead to the best subjective scoring.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116698876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781489
T. Fingscheidt, S. Heinen, P. Vary
In digital mobile speech transmission usually the most important (class la) bits provided by the speech coding scheme are protected by a CRC for error detection. As a consequence all parameters spanned by the class la bits have to be marked at the receiver either as reliable or as unreliable. In contrast to this somewhat coarse approach we propose the usage of what we call parameter individual block codes (PIBC) for the most important codec parameters. This allows joint speech codec parameter and PIBC decoding taking advantage of the error concealing properties of soft-bit speech decoding.
{"title":"Joint speech codec parameter and channel decoding of parameter individual block codes (PIBC)","authors":"T. Fingscheidt, S. Heinen, P. Vary","doi":"10.1109/SCFT.1999.781489","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781489","url":null,"abstract":"In digital mobile speech transmission usually the most important (class la) bits provided by the speech coding scheme are protected by a CRC for error detection. As a consequence all parameters spanned by the class la bits have to be marked at the receiver either as reliable or as unreliable. In contrast to this somewhat coarse approach we propose the usage of what we call parameter individual block codes (PIBC) for the most important codec parameters. This allows joint speech codec parameter and PIBC decoding taking advantage of the error concealing properties of soft-bit speech decoding.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121802999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781464
R. Taori, R. Sluijter
A well recognised problem in low bit rate representation of audio and speech signals, based on the sinusoidal model, is that of tracking the sinusoidal components. Imperfections in the analysis process and the presence of components over a limited duration of time gives rise to ambiguities in the tracking process. As a solution to this problem, we propose a mechanism to achieve closed-loop tracking by means of using analysis-by-synthesis incorporating phase prediction. A simple implementation of such an algorithm is discussed by considering an overlap-add synthesizer. Finally, the results are presented using a voiced speech segment as an example.
{"title":"Closed-loop tracking of sinusoids for speech and audio coding","authors":"R. Taori, R. Sluijter","doi":"10.1109/SCFT.1999.781464","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781464","url":null,"abstract":"A well recognised problem in low bit rate representation of audio and speech signals, based on the sinusoidal model, is that of tracking the sinusoidal components. Imperfections in the analysis process and the presence of components over a limited duration of time gives rise to ambiguities in the tracking process. As a solution to this problem, we propose a mechanism to achieve closed-loop tracking by means of using analysis-by-synthesis incorporating phase prediction. A simple implementation of such an algorithm is discussed by considering an overlap-add synthesizer. Finally, the results are presented using a voiced speech segment as an example.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125917569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781513
A. Karlsson, G. Heikkila, T. B. Minde, M. Nordlund, B. Timus
Measurement of cellular speech quality has applications from equipment installation to daily network maintenance and benchmarking. The area is under development, driven by cost and lead-time of subjective listening tests. Lately, field tests has shown usability for objective speech quality methods. The new SQI-measure, based on radio link parameters is one of these methods. It is independent of transmitted signal and can provide better performance than PSQM and much better performance then RxQual, when used for network tuning. In this paper, the SQI measure is described and categorized, performance comparison figures are presented, and a motivation that speech quality is possible to estimate given radio link status is given.
{"title":"Radio link parameter based speech quality index-SQI","authors":"A. Karlsson, G. Heikkila, T. B. Minde, M. Nordlund, B. Timus","doi":"10.1109/SCFT.1999.781513","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781513","url":null,"abstract":"Measurement of cellular speech quality has applications from equipment installation to daily network maintenance and benchmarking. The area is under development, driven by cost and lead-time of subjective listening tests. Lately, field tests has shown usability for objective speech quality methods. The new SQI-measure, based on radio link parameters is one of these methods. It is independent of transmitted signal and can provide better performance than PSQM and much better performance then RxQual, when used for network tuning. In this paper, the SQI measure is described and categorized, performance comparison figures are presented, and a motivation that speech quality is possible to estimate given radio link status is given.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116582821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781481
B. Dumitrescu, I. Tabus
In this paper we propose a new deflation algorithm for line spectral pair (LSP) computation in speech coding. This algorithm is much more reliable than other methods based on deflation.
{"title":"How to deflate polynomials in LSP computation","authors":"B. Dumitrescu, I. Tabus","doi":"10.1109/SCFT.1999.781481","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781481","url":null,"abstract":"In this paper we propose a new deflation algorithm for line spectral pair (LSP) computation in speech coding. This algorithm is much more reliable than other methods based on deflation.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132392864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781514
R. Sluijter, A.J.E.M. Janssen
A parabolic time warper designed to enhance the stationarity of voiced speech segments, is presented. It is shown how, for a harmonic signal segment, the parabolic time warping function can remove the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment. In the actual implementation of the time warping system, the linear part of the pitch frequency variation in a segment is removed on the basis of maximization of the pitch-related autocorrelation peak of the warped signal. As a by-product, the time warper yields a very reliable pitch estimation. An example on real speech is discussed.
{"title":"A time warper for speech signals","authors":"R. Sluijter, A.J.E.M. Janssen","doi":"10.1109/SCFT.1999.781514","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781514","url":null,"abstract":"A parabolic time warper designed to enhance the stationarity of voiced speech segments, is presented. It is shown how, for a harmonic signal segment, the parabolic time warping function can remove the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment. In the actual implementation of the time warping system, the linear part of the pitch frequency variation in a segment is removed on the basis of maximization of the pitch-related autocorrelation peak of the warped signal. As a by-product, the time warper yields a very reliable pitch estimation. An example on real speech is discussed.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"608 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127522835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}