Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781484
M. A. Jackson, I. Burnett
The use of phase-space portraits for speech has been examined by a number of authors. In this paper we examine the use of speech entropy via mutual information to compute the embedding delay of phonemes and hence construct meaningful phase-space portraits. Since speech signals are known to be spectrally redundant, the effects of perceptual masking on phase-space portraits is also considered. The results indicate that phase-space gives a true indication of the underlying behaviour of phonemes without significant distortion from perceptually masked signal components.
{"title":"Phase-space portraits of speech employing mutual information and perceptual masking","authors":"M. A. Jackson, I. Burnett","doi":"10.1109/SCFT.1999.781484","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781484","url":null,"abstract":"The use of phase-space portraits for speech has been examined by a number of authors. In this paper we examine the use of speech entropy via mutual information to compute the embedding delay of phonemes and hence construct meaningful phase-space portraits. Since speech signals are known to be spectrally redundant, the effects of perceptual masking on phase-space portraits is also considered. The results indicate that phase-space gives a true indication of the underlying behaviour of phonemes without significant distortion from perceptually masked signal components.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116713750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781471
S. Ragot, J. Adoul, R. Lefebvre, R. Salami
State-of-the-art narrowband speech coders operating from 4 to 16 kbit/s are mostly based on the code-excited linear predictive (CELP) model. They achieve a good synthesis quality usually at the expense of a high coding complexity. For example, in the 8 kbit/s G.729 coder the innovation codebook search is responsible for approximately half the total coder complexity, the latter being close to 20 MIPS in fixed-point DSP implementation. Less known is the relative part of spectral quantization, which is around 8% of the total complexity. CELP coders are still relevant for wideband speech coding but their complexity is greater than in the narrowband case, which becomes critical for real-time implementations. We propose in this article a two-stage algebraic-stochastic line spectral frequency (LSF) quantization scheme. It combines the strengths of algebraic and stochastic techniques, namely low computation and storage cost and good performance. The generalized Lloyd-Max algorithm is adapted for optimizing lattice codebooks obtained by spherical truncation. Simulations with a Gaussian source show that the quantization method exhibits good quality/complexity tradeoffs. Several stochastic-algebraic LSF quantizers are derived and compared to a more conventional technique.
{"title":"Low complexity LSF quantization for wideband speech coding","authors":"S. Ragot, J. Adoul, R. Lefebvre, R. Salami","doi":"10.1109/SCFT.1999.781471","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781471","url":null,"abstract":"State-of-the-art narrowband speech coders operating from 4 to 16 kbit/s are mostly based on the code-excited linear predictive (CELP) model. They achieve a good synthesis quality usually at the expense of a high coding complexity. For example, in the 8 kbit/s G.729 coder the innovation codebook search is responsible for approximately half the total coder complexity, the latter being close to 20 MIPS in fixed-point DSP implementation. Less known is the relative part of spectral quantization, which is around 8% of the total complexity. CELP coders are still relevant for wideband speech coding but their complexity is greater than in the narrowband case, which becomes critical for real-time implementations. We propose in this article a two-stage algebraic-stochastic line spectral frequency (LSF) quantization scheme. It combines the strengths of algebraic and stochastic techniques, namely low computation and storage cost and good performance. The generalized Lloyd-Max algorithm is adapted for optimizing lattice codebooks obtained by spherical truncation. Simulations with a Gaussian source show that the quantization method exhibits good quality/complexity tradeoffs. Several stochastic-algebraic LSF quantizers are derived and compared to a more conventional technique.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115520592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781506
Fenghua Liu, R. Heidari
This paper presents some update improvement in an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of the enhanced variable rate codec (EVRC). In the AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. A closed-loop procedure is used to select the sub-vectors. The overlapping between the selected vectors are allowed to prevent the pitch peak splitting. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. The field testing results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate.
{"title":"Enhancing the EVRC half rate by the algebraic VQ-CELP","authors":"Fenghua Liu, R. Heidari","doi":"10.1109/SCFT.1999.781506","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781506","url":null,"abstract":"This paper presents some update improvement in an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of the enhanced variable rate codec (EVRC). In the AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. A closed-loop procedure is used to select the sub-vectors. The overlapping between the selected vectors are allowed to prevent the pitch peak splitting. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. The field testing results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129635762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781478
Mi Suk Lee, H. Kim, S. Choi, Hwang-Soo Lee
The line spectral frequencies (LSFs) extracted from successive analysis orders are interlaced with each other. This intermodel interlacing property gives a new relationship between the closeness of LSFs and their spectral sensitivities, which enables us to propose a weighting function for LSF distortion measurement. By applying the proposed weighting function to an LSF quantizer, we can achieve better performance than when using the conventional heuristic functions. Moreover, the complexity of the proposed weighting function is much lower than that of the optimal weighting function, while their performances are almost the same.
{"title":"On the use of LSF intermodel interlacing property for spectral quantization","authors":"Mi Suk Lee, H. Kim, S. Choi, Hwang-Soo Lee","doi":"10.1109/SCFT.1999.781478","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781478","url":null,"abstract":"The line spectral frequencies (LSFs) extracted from successive analysis orders are interlaced with each other. This intermodel interlacing property gives a new relationship between the closeness of LSFs and their spectral sensitivities, which enables us to propose a weighting function for LSF distortion measurement. By applying the proposed weighting function to an LSF quantizer, we can achieve better performance than when using the conventional heuristic functions. Moreover, the complexity of the proposed weighting function is much lower than that of the optimal weighting function, while their performances are almost the same.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128905615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781521
N. Enbom, W. Kleijn
Telephone speech is usually limited to less than 4 kHz in bandwidth. This bandwidth limitation results in the typical sound of telephone speech. We present a new method of regenerating the high frequencies (4-8 kHz) based on vector quantization of the mel-frequency cepstral coefficients (MFCC). We also present two methods to avoid perceptually annoying overestimates of the signal power in the high-band. Listening tests show the benefits of the new procedures. Use of MFCC for vector quantization instead of traditionally used spectral representations improves the quality of the speech significantly. Tests also show that the wide-band speech reconstructed with the method is significantly more pleasant to the human ear than the original narrowband speech.
{"title":"Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients","authors":"N. Enbom, W. Kleijn","doi":"10.1109/SCFT.1999.781521","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781521","url":null,"abstract":"Telephone speech is usually limited to less than 4 kHz in bandwidth. This bandwidth limitation results in the typical sound of telephone speech. We present a new method of regenerating the high frequencies (4-8 kHz) based on vector quantization of the mel-frequency cepstral coefficients (MFCC). We also present two methods to avoid perceptually annoying overestimates of the signal power in the high-band. Listening tests show the benefits of the new procedures. Use of MFCC for vector quantization instead of traditionally used spectral representations improves the quality of the speech significantly. Tests also show that the wide-band speech reconstructed with the method is significantly more pleasant to the human ear than the original narrowband speech.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122239209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781518
J. Collura
Recent advances in speech enhancement and noise pre-processing algorithms have dramatically improved the quality and intelligibility of speech signals, both in the presence of acoustic noise as well as in benign environments. The use of speech enhancement in combination with voice coding algorithms and applied to governmental wireless communications systems is an important application area. This paper will first introduce one such system, the 2.4 kbps US Government Military Standard mixed excitation linear prediction (MELP) speech coding algorithm coupled with a speech enhancement algorithm developed by AT&T Research Labs. Next, the paper presents a discussion of the test conditions and results, and provide an interpretation of these results. Finally, a general discussion of related issues and conclusions is presented.
{"title":"Speech enhancement and coding in harsh acoustic noise environments","authors":"J. Collura","doi":"10.1109/SCFT.1999.781518","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781518","url":null,"abstract":"Recent advances in speech enhancement and noise pre-processing algorithms have dramatically improved the quality and intelligibility of speech signals, both in the presence of acoustic noise as well as in benign environments. The use of speech enhancement in combination with voice coding algorithms and applied to governmental wireless communications systems is an important application area. This paper will first introduce one such system, the 2.4 kbps US Government Military Standard mixed excitation linear prediction (MELP) speech coding algorithm coupled with a speech enhancement algorithm developed by AT&T Research Labs. Next, the paper presents a discussion of the test conditions and results, and provide an interpretation of these results. Finally, a general discussion of related issues and conclusions is presented.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121288541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781468
N. Harada, H. Ohmuro
We propose a 5-kHz-bandwidth CELP speech coder for various multimedia applications. For portability, the coder has three bit rate modes: MODE8 (7.8 kbit/s), MODE6 (5.75 kbit/s) and MODE4 (3.95 kbit/s). The bit rate mode can be switched frame by frame. In order to achieve both low bit rate and naturalness, 5-kHz-bandwidth speech signals are used instead of 3.4 kHz or 7-kHz-bandwidth signals. The speech signals under consideration are band-limited to 5 kHz and are sampled at 11.025 kHz. Subjective listening tests indicated that the 5-kHz-bandwidth is effective for low-bit-rate speech coders. The mean opinion score and comparative mean opinion score showed that the quality of this coder in MODE8 (5 kHz, 7.8 kbit/s) is better than that of the G.729 (3.4 kHz, 8 kbit/s), G.722 (7 kHz, 48 kbit/s), and equivalent to that of G.729 Annex E (3.4 kHz, 11.8 kbit/s). In addition, at MODE6 (5 kHz, 5.75 kbit/s), the quality of this coder is better than that of G.723.1 (3.4 kHz, 6.3 kbit/s), and equivalent to G.729 (3.4 kHz, and kbit/s). We also determine the relationship among characterizations of subjective quality, bandwidth and noisiness.
{"title":"5-kHz-bandwidth speech coder at 4-8 kbit/s","authors":"N. Harada, H. Ohmuro","doi":"10.1109/SCFT.1999.781468","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781468","url":null,"abstract":"We propose a 5-kHz-bandwidth CELP speech coder for various multimedia applications. For portability, the coder has three bit rate modes: MODE8 (7.8 kbit/s), MODE6 (5.75 kbit/s) and MODE4 (3.95 kbit/s). The bit rate mode can be switched frame by frame. In order to achieve both low bit rate and naturalness, 5-kHz-bandwidth speech signals are used instead of 3.4 kHz or 7-kHz-bandwidth signals. The speech signals under consideration are band-limited to 5 kHz and are sampled at 11.025 kHz. Subjective listening tests indicated that the 5-kHz-bandwidth is effective for low-bit-rate speech coders. The mean opinion score and comparative mean opinion score showed that the quality of this coder in MODE8 (5 kHz, 7.8 kbit/s) is better than that of the G.729 (3.4 kHz, 8 kbit/s), G.722 (7 kHz, 48 kbit/s), and equivalent to that of G.729 Annex E (3.4 kHz, 11.8 kbit/s). In addition, at MODE6 (5 kHz, 5.75 kbit/s), the quality of this coder is better than that of G.723.1 (3.4 kHz, 6.3 kbit/s), and equivalent to G.729 (3.4 kHz, and kbit/s). We also determine the relationship among characterizations of subjective quality, bandwidth and noisiness.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128721081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781508
T. Nomura, M. Iwadare
This paper proposes bitrate adaptation schemes for real-time voice over IP applications, based on multirate and scalable coding capabilities in MPEG-4 CELP. For IP telephony applications, the adaptation scheme based on multi-rate coding is utilized to achieve the best coding quality at a given bitrate. For broadcast applications, the scalable CELP coder produces a layered bitstream and the bitrate control at IP routers determines the number of bitstream layers depending on the network throughputs. Performance evaluation of bitrate adaptation using the MPEG-4 wideband CELP coder is presented.
{"title":"Voice over IP systems with speech bitrate adaptation based on MPEG-4 wideband CELP","authors":"T. Nomura, M. Iwadare","doi":"10.1109/SCFT.1999.781508","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781508","url":null,"abstract":"This paper proposes bitrate adaptation schemes for real-time voice over IP applications, based on multirate and scalable coding capabilities in MPEG-4 CELP. For IP telephony applications, the adaptation scheme based on multi-rate coding is utilized to achieve the best coding quality at a given bitrate. For broadcast applications, the scalable CELP coder produces a layered bitstream and the bitrate control at IP routers determines the number of bitstream layers depending on the network throughputs. Performance evaluation of bitrate adaptation using the MPEG-4 wideband CELP coder is presented.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116914666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781480
J. J. Parry, I. Burnett, J. Chicharo
In this paper we investigate a novel approach to low-bit rate quantisation of the spectral parameters of speech. This approach incorporates phonetic information into the structure of line spectral frequency (LSF) codebooks. As clear relationships exist between phonetic segments and LSFs, phonetic events can be expressed in terms of the structure of an LSF codebook and the successive vectors chosen by it. The investigation leads to the conclusion that the structure of LSF codebooks can be usefully employed in phonetic classification as a front end to multi-modal phonetic vocoding.
{"title":"The use of LSF-based phonetic classification in low-rate coder design","authors":"J. J. Parry, I. Burnett, J. Chicharo","doi":"10.1109/SCFT.1999.781480","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781480","url":null,"abstract":"In this paper we investigate a novel approach to low-bit rate quantisation of the spectral parameters of speech. This approach incorporates phonetic information into the structure of line spectral frequency (LSF) codebooks. As clear relationships exist between phonetic segments and LSFs, phonetic events can be expressed in terms of the structure of an LSF codebook and the successive vectors chosen by it. The investigation leads to the conclusion that the structure of LSF codebooks can be usefully employed in phonetic classification as a front end to multi-modal phonetic vocoding.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114818019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781487
T. Fischer
Trellis coded quantization (TCQ) is an efficient form of multidimensional quantization that achieves portions of the possible point density, space filling, and granular gains promised by vector quantization. For memoryless sources, the combination of TCQ and a suitable entropy code can provide performance within 0.5 dB of the rate-distortion limit.
{"title":"Trellis coded quantization","authors":"T. Fischer","doi":"10.1109/SCFT.1999.781487","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781487","url":null,"abstract":"Trellis coded quantization (TCQ) is an efficient form of multidimensional quantization that achieves portions of the possible point density, space filling, and granular gains promised by vector quantization. For memoryless sources, the combination of TCQ and a suitable entropy code can provide performance within 0.5 dB of the rate-distortion limit.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121001060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}