Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781515
J. Jensen, S. H. Jensen, E. Hansen
A novel pre-processing algorithm for CELP-coders is proposed. The algorithm aims at perturbing the original signal slightly, such that the perturbed signal is subjectively indistinguishable from the original but can be coded more effectively. A key feature of the algorithm is the possibility of controlling the frequency domain properties of the perturbations. Preliminary simulations with the proposed algorithm in combination with a CELP-like coder indicate improvements in terms of segmental SNR and subjective speech quality.
{"title":"A perturbation-based pre-processing algorithm for CELP-coders","authors":"J. Jensen, S. H. Jensen, E. Hansen","doi":"10.1109/SCFT.1999.781515","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781515","url":null,"abstract":"A novel pre-processing algorithm for CELP-coders is proposed. The algorithm aims at perturbing the original signal slightly, such that the perturbed signal is subjectively indistinguishable from the original but can be coded more effectively. A key feature of the algorithm is the possibility of controlling the frequency domain properties of the perturbations. Preliminary simulations with the proposed algorithm in combination with a CELP-like coder indicate improvements in terms of segmental SNR and subjective speech quality.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132414237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781522
J. Epps, W. Holmes
Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with a bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlations between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from decoded narrowband speech. This paper proposes a new technique for highband spectral envelope prediction, based upon codebook mapping with codebooks split by voicing. An objective comparison with several existing methods reveals that this new technique produces the smallest highband spectral distortion. Combined with a suitable highband excitation synthesis scheme, this envelope prediction scheme produces a significant quality improvement in speech that has been coded using narrowband standards.
{"title":"A new technique for wideband enhancement of coded narrowband speech","authors":"J. Epps, W. Holmes","doi":"10.1109/SCFT.1999.781522","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781522","url":null,"abstract":"Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with a bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlations between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from decoded narrowband speech. This paper proposes a new technique for highband spectral envelope prediction, based upon codebook mapping with codebooks split by voicing. An objective comparison with several existing methods reveals that this new technique produces the smallest highband spectral distortion. Combined with a suitable highband excitation synthesis scheme, this envelope prediction scheme produces a significant quality improvement in speech that has been coded using narrowband standards.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129703371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781509
K. El-Maleh, P. Kabal
In this paper, we present a novel background noise coding scheme for variable rate speech coders. Existing approaches to noise coding at very low bit rates (i.e. below 1 kbps) fail to faithfully reproduce background noise resulting in a degradation of the overall perceptual quality. In our approach, classification of the noise type is used to select the type of excitation to be used at the receiver. To illustrate the benefits of our scheme, we have modified the noise coding mode of the CDMA enhanced variable rate codec (EVRC) to include the proposed class-dependent noise excitation model. Evaluation tests have shown that we have improved the overall quality with the proposed noise coding scheme without an increase in bit rate.
{"title":"An improved background noise coding mode for variable rate speech coders","authors":"K. El-Maleh, P. Kabal","doi":"10.1109/SCFT.1999.781509","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781509","url":null,"abstract":"In this paper, we present a novel background noise coding scheme for variable rate speech coders. Existing approaches to noise coding at very low bit rates (i.e. below 1 kbps) fail to faithfully reproduce background noise resulting in a degradation of the overall perceptual quality. In our approach, classification of the noise type is used to select the type of excitation to be used at the receiver. To illustrate the benefits of our scheme, we have modified the noise coding mode of the CDMA enhanced variable rate codec (EVRC) to include the proposed class-dependent noise excitation model. Evaluation tests have shown that we have improved the overall quality with the proposed noise coding scheme without an increase in bit rate.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132326448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781473
P. Hedelin, F. Nordén, J. Skoglund
In spectral coding of speech, several different criteria are in use for designing and evaluating quantizers. One measure, spectral distortion (SD), has become dominant for comparisons between coders. At run-time, a coder normally quantizes vectors according to other measures, e.g. line spectrum frequency (LSF) distance, in order to keep computational complexity down. In this study, we adopt the SD criterion both in coder design and for quantizer operation. The quantizer is optimized to give minimal average SD scores, This allows us to address the question, is average SD measure really a good criterion, matching subjective ratings. We perform a few objective and subjective tests based on SD optimized coding and some versions thereof. Our tests imply that minimizing average SD may not lead to the best subjective scoring.
{"title":"SD optimization of spectral coders","authors":"P. Hedelin, F. Nordén, J. Skoglund","doi":"10.1109/SCFT.1999.781473","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781473","url":null,"abstract":"In spectral coding of speech, several different criteria are in use for designing and evaluating quantizers. One measure, spectral distortion (SD), has become dominant for comparisons between coders. At run-time, a coder normally quantizes vectors according to other measures, e.g. line spectrum frequency (LSF) distance, in order to keep computational complexity down. In this study, we adopt the SD criterion both in coder design and for quantizer operation. The quantizer is optimized to give minimal average SD scores, This allows us to address the question, is average SD measure really a good criterion, matching subjective ratings. We perform a few objective and subjective tests based on SD optimized coding and some versions thereof. Our tests imply that minimizing average SD may not lead to the best subjective scoring.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116698876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781514
R. Sluijter, A.J.E.M. Janssen
A parabolic time warper designed to enhance the stationarity of voiced speech segments, is presented. It is shown how, for a harmonic signal segment, the parabolic time warping function can remove the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment. In the actual implementation of the time warping system, the linear part of the pitch frequency variation in a segment is removed on the basis of maximization of the pitch-related autocorrelation peak of the warped signal. As a by-product, the time warper yields a very reliable pitch estimation. An example on real speech is discussed.
{"title":"A time warper for speech signals","authors":"R. Sluijter, A.J.E.M. Janssen","doi":"10.1109/SCFT.1999.781514","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781514","url":null,"abstract":"A parabolic time warper designed to enhance the stationarity of voiced speech segments, is presented. It is shown how, for a harmonic signal segment, the parabolic time warping function can remove the part of the frequency variation which progresses linearly with time, without changing the time duration of that segment. In the actual implementation of the time warping system, the linear part of the pitch frequency variation in a segment is removed on the basis of maximization of the pitch-related autocorrelation peak of the warped signal. As a by-product, the time warper yields a very reliable pitch estimation. An example on real speech is discussed.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"608 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127522835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781464
R. Taori, R. Sluijter
A well recognised problem in low bit rate representation of audio and speech signals, based on the sinusoidal model, is that of tracking the sinusoidal components. Imperfections in the analysis process and the presence of components over a limited duration of time gives rise to ambiguities in the tracking process. As a solution to this problem, we propose a mechanism to achieve closed-loop tracking by means of using analysis-by-synthesis incorporating phase prediction. A simple implementation of such an algorithm is discussed by considering an overlap-add synthesizer. Finally, the results are presented using a voiced speech segment as an example.
{"title":"Closed-loop tracking of sinusoids for speech and audio coding","authors":"R. Taori, R. Sluijter","doi":"10.1109/SCFT.1999.781464","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781464","url":null,"abstract":"A well recognised problem in low bit rate representation of audio and speech signals, based on the sinusoidal model, is that of tracking the sinusoidal components. Imperfections in the analysis process and the presence of components over a limited duration of time gives rise to ambiguities in the tracking process. As a solution to this problem, we propose a mechanism to achieve closed-loop tracking by means of using analysis-by-synthesis incorporating phase prediction. A simple implementation of such an algorithm is discussed by considering an overlap-add synthesizer. Finally, the results are presented using a voiced speech segment as an example.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125917569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781489
T. Fingscheidt, S. Heinen, P. Vary
In digital mobile speech transmission usually the most important (class la) bits provided by the speech coding scheme are protected by a CRC for error detection. As a consequence all parameters spanned by the class la bits have to be marked at the receiver either as reliable or as unreliable. In contrast to this somewhat coarse approach we propose the usage of what we call parameter individual block codes (PIBC) for the most important codec parameters. This allows joint speech codec parameter and PIBC decoding taking advantage of the error concealing properties of soft-bit speech decoding.
{"title":"Joint speech codec parameter and channel decoding of parameter individual block codes (PIBC)","authors":"T. Fingscheidt, S. Heinen, P. Vary","doi":"10.1109/SCFT.1999.781489","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781489","url":null,"abstract":"In digital mobile speech transmission usually the most important (class la) bits provided by the speech coding scheme are protected by a CRC for error detection. As a consequence all parameters spanned by the class la bits have to be marked at the receiver either as reliable or as unreliable. In contrast to this somewhat coarse approach we propose the usage of what we call parameter individual block codes (PIBC) for the most important codec parameters. This allows joint speech codec parameter and PIBC decoding taking advantage of the error concealing properties of soft-bit speech decoding.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121802999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781498
M. Tammi, V.T. Ruoppila, S. Kuusisto, J. Saarinen
Several speech coding algorithms modify the time scale of the residual signal to facilitate efficient coding of pitch information. Time scaling, however, results in a phase difference between the coded residual signal and the time-variant linear prediction (LP) filter used for synthesis in the decoder. In this paper, we examine the coding distortion induced by this phase difference. Moreover, we show that it may cause audible artifacts to the synthesized speech even if lossless coding of all parameters is employed. These artifacts occur particularly at onsets when the frequency response of successive LP filters changes rapidly. A waveform interpolation coder is used to illustrate the effects of the phase mismatch.
{"title":"Coding distortion caused by a phase difference between the LP filter and its residual","authors":"M. Tammi, V.T. Ruoppila, S. Kuusisto, J. Saarinen","doi":"10.1109/SCFT.1999.781498","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781498","url":null,"abstract":"Several speech coding algorithms modify the time scale of the residual signal to facilitate efficient coding of pitch information. Time scaling, however, results in a phase difference between the coded residual signal and the time-variant linear prediction (LP) filter used for synthesis in the decoder. In this paper, we examine the coding distortion induced by this phase difference. Moreover, we show that it may cause audible artifacts to the synthesized speech even if lossless coding of all parameters is employed. These artifacts occur particularly at onsets when the frequency response of successive LP filters changes rapidly. A waveform interpolation coder is used to illustrate the effects of the phase mismatch.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122593141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781481
B. Dumitrescu, I. Tabus
In this paper we propose a new deflation algorithm for line spectral pair (LSP) computation in speech coding. This algorithm is much more reliable than other methods based on deflation.
{"title":"How to deflate polynomials in LSP computation","authors":"B. Dumitrescu, I. Tabus","doi":"10.1109/SCFT.1999.781481","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781481","url":null,"abstract":"In this paper we propose a new deflation algorithm for line spectral pair (LSP) computation in speech coding. This algorithm is much more reliable than other methods based on deflation.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132392864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781483
F. Basbug, S. Nandkumar, K. Swaminathan
Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.
{"title":"Robust voice activity detection for DTX operation of speech coders","authors":"F. Basbug, S. Nandkumar, K. Swaminathan","doi":"10.1109/SCFT.1999.781483","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781483","url":null,"abstract":"Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134096726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}