Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781483
F. Basbug, S. Nandkumar, K. Swaminathan
Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.
{"title":"Robust voice activity detection for DTX operation of speech coders","authors":"F. Basbug, S. Nandkumar, K. Swaminathan","doi":"10.1109/SCFT.1999.781483","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781483","url":null,"abstract":"Robust detection of voice activity for short-term speech frames is essential for discontinuous transmission (DTX) mode of operation of vocoders such as IS-641. A reference VAD for the IS-641 coder has been chosen for such a purpose and is based on the GSM-EFR (enhance full rate) VAD. We show by developing a comprehensive evaluation procedure that the reference VAD is sensitive to speech level variations. For example, a significant increase is seen in frames falsely classified as active at speech levels of 10 dB above or below nominal level. We propose a solution based on automatic gain control to reduce level sensitivity. Objective performance measures confirm the robustness of our proposed VAD.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134096726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781472
M. Ferhaoui, S. Van Gerven
This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual quality of the speech coding.
{"title":"LSP quantization in wideband speech coders","authors":"M. Ferhaoui, S. Van Gerven","doi":"10.1109/SCFT.1999.781472","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781472","url":null,"abstract":"This paper deals with multi-stage vector quantization of line spectrum pair (LSP) parameters in wideband speech coders and discusses commonly used spectral distortion measures and their relation to the perceptual quality of the speech coding.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131836912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781497
U. Bhaskar, S. Nandkumar, K. Swaminathan, G. Zakaria
The design of a prototype waveform interpolation (PWI) based codec, operating at 3.6 kbit/s, is presented with main focus on the quantization of the slowly evolving waveform (SEW) and rapidly evolving waveform (REW) components. The SEW magnitude component is quantized using a hierarchical mean-shape-gain predictive vector quantization approach. SEW phase is derived using a phase model, based on a measure of voice periodicity. The REW magnitude is quantized using a gain and a sub-band based shape. The REW phase is obtained by high pass filtering a weighted combination of the SEW and a white noise process.
{"title":"Quantization of SEW and REW components for 3.6 kbit/s coding based on PWI","authors":"U. Bhaskar, S. Nandkumar, K. Swaminathan, G. Zakaria","doi":"10.1109/SCFT.1999.781497","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781497","url":null,"abstract":"The design of a prototype waveform interpolation (PWI) based codec, operating at 3.6 kbit/s, is presented with main focus on the quantization of the slowly evolving waveform (SEW) and rapidly evolving waveform (REW) components. The SEW magnitude component is quantized using a hierarchical mean-shape-gain predictive vector quantization approach. SEW phase is derived using a phase model, based on a measure of voice periodicity. The REW magnitude is quantized using a gain and a sub-band based shape. The REW phase is obtained by high pass filtering a weighted combination of the SEW and a white noise process.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133174974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781512
L. Thorpe, Wonho Yang
This paper describes the performance of current objective speech quality measures designed to estimate subjective quality. We examined perceptual objective quality measures using a wide range of distortions including speech compression, wireless channel impairments, VoIP channel impairments, and modifications to the signal from features such as AGC. The results of this study indicate the range of conditions to which these objective measures may be applied, the validity of the estimates they provide, and the general maturity of the field.
{"title":"Performance of current perceptual objective speech quality measures","authors":"L. Thorpe, Wonho Yang","doi":"10.1109/SCFT.1999.781512","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781512","url":null,"abstract":"This paper describes the performance of current objective speech quality measures designed to estimate subjective quality. We examined perceptual objective quality measures using a wide range of distortions including speech compression, wireless channel impairments, VoIP channel impairments, and modifications to the signal from features such as AGC. The results of this study indicate the range of conditions to which these objective measures may be applied, the validity of the estimates they provide, and the general maturity of the field.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121286945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-20DOI: 10.1109/SCFT.1999.781503
E. Ekudden, R. Hagen, I. Johansson, J. Svedberg
In this paper, we describe the adaptive multi-rate (AMR) speech coder currently under standardization for GSM systems as part of the AMR speech service. The coder is a multi-rate ACELP coder with 8 modes operating at bit-rates from 12.2 kbit/s down to 4.75 kbit/s. The coder modes are integrated in a common structure where the bit-rate scalability is realized mainly by altering the quantization schemes for the different parameters. The coder provides seamless switching on 20 ms frame boundaries. The quality when used on GSM channels is significantly higher than for existing services.
{"title":"The adaptive multi-rate speech coder","authors":"E. Ekudden, R. Hagen, I. Johansson, J. Svedberg","doi":"10.1109/SCFT.1999.781503","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781503","url":null,"abstract":"In this paper, we describe the adaptive multi-rate (AMR) speech coder currently under standardization for GSM systems as part of the AMR speech service. The coder is a multi-rate ACELP coder with 8 modes operating at bit-rates from 12.2 kbit/s down to 4.75 kbit/s. The coder modes are integrated in a common structure where the bit-rate scalability is realized mainly by altering the quantization schemes for the different parameters. The coder provides seamless switching on 20 ms frame boundaries. The quality when used on GSM channels is significantly higher than for existing services.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125350422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-06-01DOI: 10.1109/SCFT.1999.781510
S. Voran
We present two techniques that can be used to enhance objective estimators of perceived speech quality. Frame normalization and frame-energy plane partitioning are described and applied to a log-spectral-error-based estimator. The resulting estimators are compared with each other and with two established estimators. This is done through correlation with MOS values from 17 formal subjective tests. We find that the proposed techniques significantly improve the log-spectral-error-based estimator.
{"title":"Advances in objective estimation of perceived speech quality","authors":"S. Voran","doi":"10.1109/SCFT.1999.781510","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781510","url":null,"abstract":"We present two techniques that can be used to enhance objective estimators of perceived speech quality. Frame normalization and frame-energy plane partitioning are described and applied to a log-spectral-error-based estimator. The resulting estimators are compared with each other and with two established estimators. This is done through correlation with MOS values from 17 formal subjective tests. We find that the proposed techniques significantly improve the log-spectral-error-based estimator.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117271748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1999-03-08DOI: 10.1109/SCFT.1999.781507
K. Ozawa, T. Nomura, M. Serizawa, H. Ehara, K. Yoshida, N. Tana
This paper evaluates MPEG-4 narrowband (NB) CELP speech coding under various mobile communication conditions, such as clean, background noise and transmission errors. In order to make the codec robust against the errors with minimum increase of redundant bits, a CRC error correction code is attached into the codec as well as an error concealment is included in the decoder. Subjective evaluation results demonstrate that the speech quality for MPEG-4 speech coding at above 8.3 kb/s is higher than that for the ITU-T G.726 ADPCM at 32 kb/s in the clean speech condition. Further, the speech quality degradation is less than 0.1 in MOS under 10/sup -3/ bit error conditions, and still comparable to or higher than that for G.726 at 32 kb/s without error.
{"title":"Study and subjective evaluation on MPEG-4 narrowband CELP coding under mobile communication conditions","authors":"K. Ozawa, T. Nomura, M. Serizawa, H. Ehara, K. Yoshida, N. Tana","doi":"10.1109/SCFT.1999.781507","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781507","url":null,"abstract":"This paper evaluates MPEG-4 narrowband (NB) CELP speech coding under various mobile communication conditions, such as clean, background noise and transmission errors. In order to make the codec robust against the errors with minimum increase of redundant bits, a CRC error correction code is attached into the codec as well as an error concealment is included in the decoder. Subjective evaluation results demonstrate that the speech quality for MPEG-4 speech coding at above 8.3 kb/s is higher than that for the ITU-T G.726 ADPCM at 32 kb/s in the clean speech condition. Further, the speech quality degradation is less than 0.1 in MOS under 10/sup -3/ bit error conditions, and still comparable to or higher than that for G.726 at 32 kb/s without error.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131444705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/SCFT.1999.781474
M. Murthi, B. Rao
In this paper, we present several features of minimum variance distortionless response (MVDR) based all-pole filters which are suitable for modeling all types of speech. In particular, we demonstrate how the MVDR all-pole spectrum, based upon time-domain correlations, can provide high quality spectral envelope modeling of voiced speech. Simulation results are included showing that the MVDR all-pole spectrum's modeling of voiced speech harmonics improves as the model order increases, leading to a monotonically decreasing spectral distortion. Furthermore, we show how the MVDR all-pole envelope can be enhanced by using forward-backward linear prediction. In addition, low order (10-14) MVDR based all-pole filters are examined and compared with other all-pole spectral envelopes. The reduced order MVDR all-pole spectrum is shown to compare favorably with linear prediction (LP) and LP cubic spline spectral envelopes in terms of spectral modeling and complexity.
{"title":"MVDR based all-pole modeling: properties, enhancements, and comparisons","authors":"M. Murthi, B. Rao","doi":"10.1109/SCFT.1999.781474","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781474","url":null,"abstract":"In this paper, we present several features of minimum variance distortionless response (MVDR) based all-pole filters which are suitable for modeling all types of speech. In particular, we demonstrate how the MVDR all-pole spectrum, based upon time-domain correlations, can provide high quality spectral envelope modeling of voiced speech. Simulation results are included showing that the MVDR all-pole spectrum's modeling of voiced speech harmonics improves as the model order increases, leading to a monotonically decreasing spectral distortion. Furthermore, we show how the MVDR all-pole envelope can be enhanced by using forward-backward linear prediction. In addition, low order (10-14) MVDR based all-pole filters are examined and compared with other all-pole spectral envelopes. The reduced order MVDR all-pole spectrum is shown to compare favorably with linear prediction (LP) and LP cubic spline spectral envelopes in terms of spectral modeling and complexity.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124993364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/SCFT.1999.781493
Hong-Goo Kang, D. Sen
This paper describes an embedded speech coder based on waveform interpolation (WI) techniques. Since the quantization of line spectral frequency (LSF) parameters is fairly orthogonal to the quantization of excitation information, designing an embedded system with WI is much easier than that of other approaches. By using a hierarchical bit-allocation of excitation signals that consist of a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW), the proposed system works well at the bit-rate of 2.0, 2.4, 3.0, 4.0 and 4.8 kbit/s. Listening tests indicate that the performance of the new system is comparable to an optimized fixed-rate WI coder, and the quality degrades gracefully as the bit-rate decreases.
{"title":"Embedded WI coding between 2.0 and 4.8 kbit/s","authors":"Hong-Goo Kang, D. Sen","doi":"10.1109/SCFT.1999.781493","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781493","url":null,"abstract":"This paper describes an embedded speech coder based on waveform interpolation (WI) techniques. Since the quantization of line spectral frequency (LSF) parameters is fairly orthogonal to the quantization of excitation information, designing an embedded system with WI is much easier than that of other approaches. By using a hierarchical bit-allocation of excitation signals that consist of a slowly evolving waveform (SEW) and a rapidly evolving waveform (REW), the proposed system works well at the bit-rate of 2.0, 2.4, 3.0, 4.0 and 4.8 kbit/s. Listening tests indicate that the performance of the new system is comparable to an optimized fixed-rate WI coder, and the quality degrades gracefully as the bit-rate decreases.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132247626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}