首页 > 最新文献

1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)最新文献

英文 中文
LPC quantization requirements for the GPP-CELP coder LPC量化要求的GPP-CELP编码器
P. Mermelstein, Y. Qian, K. Zarrinkoub
Code-excited linear prediction coding with generalized pitch prediction (GPP-CELP) requires linear prediction filtering of the stochastic codebook output prior to addition of the adaptive codebook (ACE) component. The ACE component represents a sequence of past reconstructed samples passed through a low-pass filter to reflect the reduced pitch periodicity of the higher speech frequencies. The spectrum of the residual manifests broad peaks leading to significantly narrower distributions in the LPC parameter space. Additionally, the quantization error of the residual may be masked by the significantly greater energy of the ACE component. This work compares the quantization requirements for the information required to represent the time-varying LPC filter of the GPP-CELP coder with that of the classical CELP coder. With non-predictive coding of the LPC information a bit-rate reduction from 20 bits/20 ms to 16 bits/20 ms appears feasible without introducing noticeable degradation due to quantization.
基于广义基音预测的码激励线性预测编码(GPP-CELP)要求在加入自适应码本(ACE)分量之前对随机码本输出进行线性预测滤波。ACE分量表示经过低通滤波器的过去重构样本序列,以反映较高语音频率的降低音调周期性。残差谱表现为宽峰,导致LPC参数空间的分布明显变窄。此外,残差的量化误差可能被ACE分量的显著更大的能量所掩盖。本工作比较了GPP-CELP编码器的时变LPC滤波器与经典CELP编码器的时变LPC滤波器所需信息的量化要求。对于LPC信息的非预测编码,比特率从20比特/20毫秒降低到16比特/20毫秒似乎是可行的,而不会由于量化而引起明显的退化。
{"title":"LPC quantization requirements for the GPP-CELP coder","authors":"P. Mermelstein, Y. Qian, K. Zarrinkoub","doi":"10.1109/SCFT.1999.781477","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781477","url":null,"abstract":"Code-excited linear prediction coding with generalized pitch prediction (GPP-CELP) requires linear prediction filtering of the stochastic codebook output prior to addition of the adaptive codebook (ACE) component. The ACE component represents a sequence of past reconstructed samples passed through a low-pass filter to reflect the reduced pitch periodicity of the higher speech frequencies. The spectrum of the residual manifests broad peaks leading to significantly narrower distributions in the LPC parameter space. Additionally, the quantization error of the residual may be masked by the significantly greater energy of the ACE component. This work compares the quantization requirements for the information required to represent the time-varying LPC filter of the GPP-CELP coder with that of the classical CELP coder. With non-predictive coding of the LPC information a bit-rate reduction from 20 bits/20 ms to 16 bits/20 ms appears feasible without introducing noticeable degradation due to quantization.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128131715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A novel pitch-lag search method using adaptive weighting and median filtering 基于自适应加权和中值滤波的音高滞后搜索方法
P. Ojala, P. Haavisto, A. Lakaniemi, J. Vainio
This paper presents a novel method to estimate the pitch-lag in a speech codec. The pitch-lag is related to the fundamental frequency of the speech signal and an accurate estimation of this parameter is important for the subjective quality of the synthesised speech. A common problem in speech codecs is that the estimation of the pitch-lag often produces a multiple or a sub-multiple of the true pitch value. When these incorrect pitch-lag values are used in speech synthesis the subjective quality of the speech is degraded. This paper presents an improved method where the estimation of the pitch-lag parameter is biased towards the pitch-lag values of the previous speech segments resulting in a consistent set of consecutive pitch-lag values and a high quality reconstructed signal. The classification of speech into voiced and unvoiced parts is used when tracking the pitch-lag values and adapting the pitch track centered weighting function.
提出了一种估计语音编解码器中音高滞后的新方法。音高滞后与语音信号的基频有关,对该参数的准确估计对于合成语音的主观质量非常重要。语音编解码器的一个常见问题是,对音高滞后的估计通常会产生真实音高值的倍数或次倍数。当这些不正确的音高滞后值被用于语音合成时,语音的主观质量就会下降。本文提出了一种改进的方法,该方法对音高滞后参数的估计偏向于先前语音片段的音高滞后值,从而得到一组一致的连续音高滞后值和高质量的重构信号。在跟踪音高滞后值和采用音高轨道中心加权函数时,将语音分为浊音部分和不浊音部分。
{"title":"A novel pitch-lag search method using adaptive weighting and median filtering","authors":"P. Ojala, P. Haavisto, A. Lakaniemi, J. Vainio","doi":"10.1109/SCFT.1999.781502","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781502","url":null,"abstract":"This paper presents a novel method to estimate the pitch-lag in a speech codec. The pitch-lag is related to the fundamental frequency of the speech signal and an accurate estimation of this parameter is important for the subjective quality of the synthesised speech. A common problem in speech codecs is that the estimation of the pitch-lag often produces a multiple or a sub-multiple of the true pitch value. When these incorrect pitch-lag values are used in speech synthesis the subjective quality of the speech is degraded. This paper presents an improved method where the estimation of the pitch-lag parameter is biased towards the pitch-lag values of the previous speech segments resulting in a consistent set of consecutive pitch-lag values and a high quality reconstructed signal. The classification of speech into voiced and unvoiced parts is used when tracking the pitch-lag values and adapting the pitch track centered weighting function.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115417564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parametric speech coding-HVXC at 2.0-4.0 kbps 参数语音编码- hvxc在2.0-4.0 kbps
M. Nishiguchi, A. Inoue, Y. Maeda, J. Matsumoto
MPEG-4 parametric speech coding, harmonic vector excitation coding (HVXC) algorithm, is described. New features of the coder includes a quantizer scheme capable of generating 2.0 and 4.0 kbps scalable bit-streams, where 2.0 kbps decoding is possible using a subset of 4.0 kbps bit-stream. Time scale modification of speech is also possible without changing pitch nor phoneme for fast and slow playback mode. Listening tests show that the proposed coding method at 2.0 kbps provides significantly better quality than that of FS1016 CELP at 4.8 kbps. In October 1998, the HVXC coder was adopted to the Final Draft International Standard (FDIS) of MPEG-4 standardization.
介绍了MPEG-4参数化语音编码,谐波矢量激励编码(HVXC)算法。编码器的新功能包括能够生成2.0和4.0 kbps可扩展比特流的量化器方案,其中使用4.0 kbps比特流的子集可以实现2.0 kbps解码。语音的时间尺度修改也可以不改变音高或音素的快速和缓慢播放模式。听力测试表明,2.0 kbps的编码方法明显优于4.8 kbps的FS1016 CELP编码方法。1998年10月,HVXC编码器被采纳为MPEG-4标准化的最终国际标准草案(FDIS)。
{"title":"Parametric speech coding-HVXC at 2.0-4.0 kbps","authors":"M. Nishiguchi, A. Inoue, Y. Maeda, J. Matsumoto","doi":"10.1109/SCFT.1999.781492","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781492","url":null,"abstract":"MPEG-4 parametric speech coding, harmonic vector excitation coding (HVXC) algorithm, is described. New features of the coder includes a quantizer scheme capable of generating 2.0 and 4.0 kbps scalable bit-streams, where 2.0 kbps decoding is possible using a subset of 4.0 kbps bit-stream. Time scale modification of speech is also possible without changing pitch nor phoneme for fast and slow playback mode. Listening tests show that the proposed coding method at 2.0 kbps provides significantly better quality than that of FS1016 CELP at 4.8 kbps. In October 1998, the HVXC coder was adopted to the Final Draft International Standard (FDIS) of MPEG-4 standardization.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116059051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Wideband speech coding using forward/backward adaptive prediction with mixed time/frequency domain excitation 基于时频混合激励的前向/后向自适应宽带语音编码
J. Schnitzler, J. Eggers, C. Erdmann, P. Vary
This paper describes a wideband (7 kHz) speech coding scheme using code-excited linear prediction (CELP) with mixed time and frequency domain excitation. The proposed frequency domain innovation can be used alternatively or in parallel to a time domain codebook. In addition an improved synthesis filter is used consisting of a signal dependent combination of a forward adaptive and a backward adaptive (FA/BA) structure. An experimental codec operating at 15.5 or 20.0 kbit/s is demonstrated.
本文提出了一种采用码激励线性预测(CELP)的时域和频域混合激励的宽带(7khz)语音编码方案。所提出的频域创新可以与时域码本交替或并行使用。此外,采用了一种由前向自适应和后向自适应(FA/BA)结构的信号依赖组合组成的改进的合成滤波器。演示了一种工作在15.5或20.0 kbit/s的实验编解码器。
{"title":"Wideband speech coding using forward/backward adaptive prediction with mixed time/frequency domain excitation","authors":"J. Schnitzler, J. Eggers, C. Erdmann, P. Vary","doi":"10.1109/SCFT.1999.781465","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781465","url":null,"abstract":"This paper describes a wideband (7 kHz) speech coding scheme using code-excited linear prediction (CELP) with mixed time and frequency domain excitation. The proposed frequency domain innovation can be used alternatively or in parallel to a time domain codebook. In addition an improved synthesis filter is used consisting of a signal dependent combination of a forward adaptive and a backward adaptive (FA/BA) structure. An experimental codec operating at 15.5 or 20.0 kbit/s is demonstrated.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125846323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimized error correction of MELP speech parameters via maximum a posteriori (MAP) techniques 利用最大后验(MAP)技术优化MELP语音参数的纠错
D.J. Rahikka, T. Fuja, T. Fazel
The U.S. Government has developed and adopted a new Federal standard vocoder which operates at 2400 bps and is called MELP-mixed excitation linear prediction. This algorithm has quite good voice quality under benign error channel conditions. However, when subjected to high error conditions as may be experienced in vehicular applications, correction techniques may be employed which utilize the underlying inter-frame residual redundancy of the MELP parameters. This paper describes experiments conducted on the MELP algorithm when combined with Viterbi convolutional error decoding, and enhanced with maximum a posteriori techniques which capitalize on the redundancy statistics. Both hard and soft Viterbi decoding situations are investigated.
美国政府开发并采用了一种新的联邦标准声码器,其工作速度为2400bps,称为melp混合激励线性预测。在良性误差信道条件下,该算法具有较好的语音质量。然而,当受到车辆应用中可能经历的高误差条件时,可以采用校正技术,利用MELP参数的潜在帧间剩余冗余。本文描述了MELP算法与Viterbi卷积错误解码相结合的实验,并利用冗余统计的最大后验技术进行了增强。研究了硬维特比译码和软维特比译码的情况。
{"title":"Optimized error correction of MELP speech parameters via maximum a posteriori (MAP) techniques","authors":"D.J. Rahikka, T. Fuja, T. Fazel","doi":"10.1109/SCFT.1999.781490","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781490","url":null,"abstract":"The U.S. Government has developed and adopted a new Federal standard vocoder which operates at 2400 bps and is called MELP-mixed excitation linear prediction. This algorithm has quite good voice quality under benign error channel conditions. However, when subjected to high error conditions as may be experienced in vehicular applications, correction techniques may be employed which utilize the underlying inter-frame residual redundancy of the MELP parameters. This paper describes experiments conducted on the MELP algorithm when combined with Viterbi convolutional error decoding, and enhanced with maximum a posteriori techniques which capitalize on the redundancy statistics. Both hard and soft Viterbi decoding situations are investigated.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125532535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Enhanced waveform interpolative coding at 4 kbps 增强波形插值编码在4 kbps
O. Gottesman, A. Gersho
This paper presents an enhanced waveform interpolative (EWI) speech coder at 4 kbps. The system incorporates novel features such as analysis-by-synthesis (AbS) vector-quantization (VQ) of the dispersion-phase, AbS optimization of the slowly evolving waveform (SEW), a special pitch search for transitions, and switched-predictive analysis-by-synthesis gain VQ. Subjective quality tests indicate that it exceeds that of MPEG-4 at 4 kbps and of G.723.1 at 5.3 kbps, and it is slightly better than that of G.723.1 at 6.3 kbps.
提出了一种速度为4kbps的增强型波形插值语音编码器。该系统集成了新的特性,如色散相位的合成分析(AbS)矢量量化(VQ)、慢演变波形的AbS优化(SEW)、过渡的特殊音高搜索以及切换预测合成分析增益VQ。主观质量测试表明,它超过了MPEG-4的4 kbps和G.723.1的5.3 kbps,略好于G.723.1的6.3 kbps。
{"title":"Enhanced waveform interpolative coding at 4 kbps","authors":"O. Gottesman, A. Gersho","doi":"10.1109/SCFT.1999.781494","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781494","url":null,"abstract":"This paper presents an enhanced waveform interpolative (EWI) speech coder at 4 kbps. The system incorporates novel features such as analysis-by-synthesis (AbS) vector-quantization (VQ) of the dispersion-phase, AbS optimization of the slowly evolving waveform (SEW), a special pitch search for transitions, and switched-predictive analysis-by-synthesis gain VQ. Subjective quality tests indicate that it exceeds that of MPEG-4 at 4 kbps and of G.723.1 at 5.3 kbps, and it is slightly better than that of G.723.1 at 6.3 kbps.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130937160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A new technique for wideband enhancement of coded narrowband speech 一种编码窄带语音的宽带增强新技术
J. Epps, W. Holmes
Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with a bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlations between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from decoded narrowband speech. This paper proposes a new technique for highband spectral envelope prediction, based upon codebook mapping with codebooks split by voicing. An objective comparison with several existing methods reveals that this new technique produces the smallest highband spectral distortion. Combined with a suitable highband excitation synthesis scheme, this envelope prediction scheme produces a significant quality improvement in speech that has been coded using narrowband standards.
电话语音的带宽通常限制在4千赫,导致“模糊”的质量。对带宽大于4khz的语音进行编码可以减少这种失真,但需要更高的比特率来避免其他类型的失真。编码更宽带宽语音的另一种方法是利用0-4 kHz和4-8 kHz语音带之间的相关性,从解码的窄带语音中重新合成宽带语音。本文提出了一种基于码本映射的高频段频谱包络预测新技术。与几种现有方法的客观比较表明,该方法产生的高频光谱失真最小。结合合适的高频段激励综合方案,该包络预测方案对使用窄带标准编码的语音产生了显著的质量改善。
{"title":"A new technique for wideband enhancement of coded narrowband speech","authors":"J. Epps, W. Holmes","doi":"10.1109/SCFT.1999.781522","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781522","url":null,"abstract":"Telephone speech is typically bandlimited to 4 kHz, resulting in a 'muffled' quality. Coding speech with a bandwidth greater than 4 kHz reduces this distortion, but requires a higher bit rate to avoid other types of distortion. An alternative to coding wider bandwidth speech is to exploit correlations between the 0-4 kHz and 4-8 kHz speech bands to re-synthesize wideband speech from decoded narrowband speech. This paper proposes a new technique for highband spectral envelope prediction, based upon codebook mapping with codebooks split by voicing. An objective comparison with several existing methods reveals that this new technique produces the smallest highband spectral distortion. Combined with a suitable highband excitation synthesis scheme, this envelope prediction scheme produces a significant quality improvement in speech that has been coded using narrowband standards.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129703371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
Design of test sequences for G.729 Annex E G.729测试序列设计
S. Ragot, R. Salami, R. Lefebvre
The 11.8 kb/s extension of the G.729 codec, also known as Annex E of the G.729 Recommendation, has been ratified by the ITU-T. This paper describes how the related test sequences have been designed, using the fixed-point C simulation of the codec. The design method is based on the concept of coverage, already used in the design of test sequences for the G.729 codec. Coverage ensures that all possible parameter values are observed in the bitstream, and all portions of the algorithm are executed at least once. Experiments showed that this approach guarantees a satisfying reliability.
G.729编解码器的11.8 kb/s扩展,也称为G.729建议书的附件E,已被ITU-T批准。本文介绍了如何利用编解码器的定点C仿真设计相关的测试序列。该设计方法基于覆盖的概念,已经用于G.729编解码器的测试序列设计。覆盖确保在比特流中观察到所有可能的参数值,并且算法的所有部分至少执行一次。实验表明,该方法具有较好的可靠性。
{"title":"Design of test sequences for G.729 Annex E","authors":"S. Ragot, R. Salami, R. Lefebvre","doi":"10.1109/SCFT.1999.781504","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781504","url":null,"abstract":"The 11.8 kb/s extension of the G.729 codec, also known as Annex E of the G.729 Recommendation, has been ratified by the ITU-T. This paper describes how the related test sequences have been designed, using the fixed-point C simulation of the codec. The design method is based on the concept of coverage, already used in the design of test sequences for the G.729 codec. Coverage ensures that all possible parameter values are observed in the bitstream, and all portions of the algorithm are executed at least once. Experiments showed that this approach guarantees a satisfying reliability.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129791590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recursive coding of spectrum parameters 频谱参数的递归编码
J. Samuelsson, P. Hedelin
Estimates of optimal performance in terms of spectral distortion (SD) for first order time-recursive spectrum coders are presented. Extensions of high rate theory provides us with the formulas to calculate estimates and also tells us how to design coders with optimal VQ point density. For this purpose, the PDF of the current spectrum parameter vector, given the previous, is needed. This conditional PDF is obtained analytically from a model PDF for pairs of consecutive parameter vectors, based on Gaussian mixtures. The theory gives a lower bound of 16 bits to achieve 1 dB SD. Practical coders must base the adaptive codebook design on quantized previous vectors and experiments suggest that another 2-3 bits is needed to achieve 1 dB SD. Informal subjective tests indicate that transparent quality may be maintained at even lower rates.
给出了一阶时间递归频谱编码器在频谱失真(SD)方面的最优性能估计。高速率理论的扩展为我们提供了估计的计算公式,并告诉我们如何设计具有最佳VQ点密度的编码器。为此,在给定前一种情况下,需要当前频谱参数矢量的PDF。基于高斯混合,从连续参数向量对的模型PDF中解析得到了条件PDF。该理论给出了16位的下限以实现1db SD。实际的编码器必须将自适应码本设计基于量化的先前向量,实验表明需要另外2-3位才能实现1 dB SD。非正式的主观测试表明,透明的质量可能以更低的比率保持。
{"title":"Recursive coding of spectrum parameters","authors":"J. Samuelsson, P. Hedelin","doi":"10.1109/SCFT.1999.781476","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781476","url":null,"abstract":"Estimates of optimal performance in terms of spectral distortion (SD) for first order time-recursive spectrum coders are presented. Extensions of high rate theory provides us with the formulas to calculate estimates and also tells us how to design coders with optimal VQ point density. For this purpose, the PDF of the current spectrum parameter vector, given the previous, is needed. This conditional PDF is obtained analytically from a model PDF for pairs of consecutive parameter vectors, based on Gaussian mixtures. The theory gives a lower bound of 16 bits to achieve 1 dB SD. Practical coders must base the adaptive codebook design on quantized previous vectors and experiments suggest that another 2-3 bits is needed to achieve 1 dB SD. Informal subjective tests indicate that transparent quality may be maintained at even lower rates.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125465843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
On waveform-interpolation coding with asymptotically perfect reconstruction 具有渐近完美重构的波形插值编码
T. Eriksson, W. Kleijn
For coders which must produce high speech quality, it is beneficial to have a coding structure which gives zero distortion in the waveform when the quantizer error vanishes (asymptotically perfect reconstruction, APR). It is possible to introduce this property to waveform interpolation (WI) coders by using perfect reconstruction filter banks for analysis and synthesis. Unfortunately, the perfect-reconstruction filter banks are, in general, associated with disadvantages such as oversampling, a loss of physical meaning of the parameters, and increased delay. These disadvantages disappear for the filter bank based on the block DFT transform, but the latter method suffers from energy discontinuities. By using a pre-processor in combination with a block-DFT based WI coder, a coding structure is obtained which maintains the advantages of earlier WI coders and adds the APR property. This new structure is most useful for higher rate WI coders.
对于必须产生高语音质量的编码器,当量化器误差消失时,具有波形零失真的编码结构(渐近完美重构,APR)是有益的。通过使用完美的重构滤波器组进行分析和合成,可以将这一特性引入波形插值(WI)编码器。不幸的是,完美重构滤波器组通常具有过采样、参数物理意义丢失和延迟增加等缺点。基于块DFT变换的滤波器组消除了这些缺点,但后者存在能量不连续的问题。将预处理器与基于块dft的WI编码器相结合,得到了一种既保留了早期WI编码器的优点又增加了APR特性的编码结构。这种新结构对高速率WI编码器最有用。
{"title":"On waveform-interpolation coding with asymptotically perfect reconstruction","authors":"T. Eriksson, W. Kleijn","doi":"10.1109/SCFT.1999.781495","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781495","url":null,"abstract":"For coders which must produce high speech quality, it is beneficial to have a coding structure which gives zero distortion in the waveform when the quantizer error vanishes (asymptotically perfect reconstruction, APR). It is possible to introduce this property to waveform interpolation (WI) coders by using perfect reconstruction filter banks for analysis and synthesis. Unfortunately, the perfect-reconstruction filter banks are, in general, associated with disadvantages such as oversampling, a loss of physical meaning of the parameters, and increased delay. These disadvantages disappear for the filter bank based on the block DFT transform, but the latter method suffers from energy discontinuities. By using a pre-processor in combination with a block-DFT based WI coder, a coding structure is obtained which maintains the advantages of earlier WI coders and adds the APR property. This new structure is most useful for higher rate WI coders.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"280 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122709938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1