首页 > 最新文献

1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)最新文献

英文 中文
Phase-space portraits of speech employing mutual information and perceptual masking 使用互信息和感知掩蔽的语音相空间肖像
M. A. Jackson, I. Burnett
The use of phase-space portraits for speech has been examined by a number of authors. In this paper we examine the use of speech entropy via mutual information to compute the embedding delay of phonemes and hence construct meaningful phase-space portraits. Since speech signals are known to be spectrally redundant, the effects of perceptual masking on phase-space portraits is also considered. The results indicate that phase-space gives a true indication of the underlying behaviour of phonemes without significant distortion from perceptually masked signal components.
许多作者已经研究了语音中相空间肖像的使用。在本文中,我们研究了通过互信息使用语音熵来计算音素的嵌入延迟,从而构建有意义的相空间肖像。由于已知语音信号具有频谱冗余,因此还考虑了感知掩蔽对相空间肖像的影响。结果表明,相空间给出了音素的潜在行为的真实指示,而没有明显的感知掩蔽信号成分失真。
{"title":"Phase-space portraits of speech employing mutual information and perceptual masking","authors":"M. A. Jackson, I. Burnett","doi":"10.1109/SCFT.1999.781484","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781484","url":null,"abstract":"The use of phase-space portraits for speech has been examined by a number of authors. In this paper we examine the use of speech entropy via mutual information to compute the embedding delay of phonemes and hence construct meaningful phase-space portraits. Since speech signals are known to be spectrally redundant, the effects of perceptual masking on phase-space portraits is also considered. The results indicate that phase-space gives a true indication of the underlying behaviour of phonemes without significant distortion from perceptually masked signal components.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116713750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Low complexity LSF quantization for wideband speech coding 宽带语音编码的低复杂度LSF量化
S. Ragot, J. Adoul, R. Lefebvre, R. Salami
State-of-the-art narrowband speech coders operating from 4 to 16 kbit/s are mostly based on the code-excited linear predictive (CELP) model. They achieve a good synthesis quality usually at the expense of a high coding complexity. For example, in the 8 kbit/s G.729 coder the innovation codebook search is responsible for approximately half the total coder complexity, the latter being close to 20 MIPS in fixed-point DSP implementation. Less known is the relative part of spectral quantization, which is around 8% of the total complexity. CELP coders are still relevant for wideband speech coding but their complexity is greater than in the narrowband case, which becomes critical for real-time implementations. We propose in this article a two-stage algebraic-stochastic line spectral frequency (LSF) quantization scheme. It combines the strengths of algebraic and stochastic techniques, namely low computation and storage cost and good performance. The generalized Lloyd-Max algorithm is adapted for optimizing lattice codebooks obtained by spherical truncation. Simulations with a Gaussian source show that the quantization method exhibits good quality/complexity tradeoffs. Several stochastic-algebraic LSF quantizers are derived and compared to a more conventional technique.
目前最先进的4 ~ 16kbit /s窄带语音编码器大多基于码激发线性预测(CELP)模型。它们通常以较高的编码复杂度为代价来实现良好的合成质量。例如,在8 kbit/s的G.729编码器中,创新码本搜索负责大约一半的总编码器复杂性,后者在定点DSP实现中接近20 MIPS。鲜为人知的是光谱量化的相关部分,它大约占总复杂性的8%。CELP编码器仍然适用于宽带语音编码,但其复杂性比窄带情况下更大,这对实时实现至关重要。本文提出了一种两阶段代数-随机线谱频率(LSF)量化方案。它结合了代数技术和随机技术的优点,即计算和存储成本低,性能好。采用广义Lloyd-Max算法对球面截断得到的格码本进行优化。用高斯源进行的仿真表明,量化方法在质量和复杂度之间取得了很好的平衡。推导了几种随机代数LSF量化器,并与一种更传统的技术进行了比较。
{"title":"Low complexity LSF quantization for wideband speech coding","authors":"S. Ragot, J. Adoul, R. Lefebvre, R. Salami","doi":"10.1109/SCFT.1999.781471","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781471","url":null,"abstract":"State-of-the-art narrowband speech coders operating from 4 to 16 kbit/s are mostly based on the code-excited linear predictive (CELP) model. They achieve a good synthesis quality usually at the expense of a high coding complexity. For example, in the 8 kbit/s G.729 coder the innovation codebook search is responsible for approximately half the total coder complexity, the latter being close to 20 MIPS in fixed-point DSP implementation. Less known is the relative part of spectral quantization, which is around 8% of the total complexity. CELP coders are still relevant for wideband speech coding but their complexity is greater than in the narrowband case, which becomes critical for real-time implementations. We propose in this article a two-stage algebraic-stochastic line spectral frequency (LSF) quantization scheme. It combines the strengths of algebraic and stochastic techniques, namely low computation and storage cost and good performance. The generalized Lloyd-Max algorithm is adapted for optimizing lattice codebooks obtained by spherical truncation. Simulations with a Gaussian source show that the quantization method exhibits good quality/complexity tradeoffs. Several stochastic-algebraic LSF quantizers are derived and compared to a more conventional technique.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"232 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115520592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Enhancing the EVRC half rate by the algebraic VQ-CELP 利用代数VQ-CELP提高EVRC半率
Fenghua Liu, R. Heidari
This paper presents some update improvement in an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of the enhanced variable rate codec (EVRC). In the AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. A closed-loop procedure is used to select the sub-vectors. The overlapping between the selected vectors are allowed to prevent the pitch peak splitting. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. The field testing results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate.
本文对代数矢量量化码本激励线性预测(AVQ-CELP)语音编解码器进行了一些改进。目的是增强半速率模式的增强型可变速率编解码器(EVRC)。在AVQ-CELP方案中,只对感知上重要的组件进行编码,并且组件的选择以类似于ACELP的方式完成。一个闭环程序被用来选择子向量。允许所选矢量之间的重叠以防止基音峰值分裂。所选择的子向量被连接和矢量量化。采用综合分析策略确定最优激励。采用广义劳埃德算法(GLA)对AVQ码本进行优化。为了提高浊音帧的合成质量,将ACELP应用于强浊音帧中。该算法已应用于诺基亚CDMA手机样机中。现场测试结果表明,与在最大半速率下运行的标准EVRC相比,有相当大的改进。
{"title":"Enhancing the EVRC half rate by the algebraic VQ-CELP","authors":"Fenghua Liu, R. Heidari","doi":"10.1109/SCFT.1999.781506","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781506","url":null,"abstract":"This paper presents some update improvement in an algebraic vector quantized codebook excited linear prediction (AVQ-CELP) speech codec. The objective is to enhance the half rate mode of the enhanced variable rate codec (EVRC). In the AVQ-CELP scheme, only the perceptually important components are encoded, and the selection of the components is done in a way similar to the ACELP. A closed-loop procedure is used to select the sub-vectors. The overlapping between the selected vectors are allowed to prevent the pitch peak splitting. The selected sub-vectors are concatenated and vector quantized. An analysis-by-synthesis strategy is used to determine the optimal excitation. The generalized Lloyd algorithm (GLA) is used to optimize the AVQ codebook. In order to improve the synthesis quality of voiced frames, ACELP is used in the strong voiced frames. The proposed algorithm was incorporated in the Nokia CDMA handset prototype. The field testing results indicate a considerable improvement relative to the standard EVRC operating at the maximum half-rate.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129635762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the use of LSF intermodel interlacing property for spectral quantization 利用LSF模型间交错特性进行谱量化
Mi Suk Lee, H. Kim, S. Choi, Hwang-Soo Lee
The line spectral frequencies (LSFs) extracted from successive analysis orders are interlaced with each other. This intermodel interlacing property gives a new relationship between the closeness of LSFs and their spectral sensitivities, which enables us to propose a weighting function for LSF distortion measurement. By applying the proposed weighting function to an LSF quantizer, we can achieve better performance than when using the conventional heuristic functions. Moreover, the complexity of the proposed weighting function is much lower than that of the optimal weighting function, while their performances are almost the same.
从连续的分析序列中提取的线谱频率(lfs)相互交错。这种模式间的交错特性给出了LSF的紧密度与其光谱灵敏度之间的新关系,使我们能够提出用于LSF失真测量的加权函数。通过将所提出的加权函数应用于LSF量化器,我们可以获得比使用传统启发式函数更好的性能。此外,所提加权函数的复杂度远低于最优加权函数,而两者的性能基本相同。
{"title":"On the use of LSF intermodel interlacing property for spectral quantization","authors":"Mi Suk Lee, H. Kim, S. Choi, Hwang-Soo Lee","doi":"10.1109/SCFT.1999.781478","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781478","url":null,"abstract":"The line spectral frequencies (LSFs) extracted from successive analysis orders are interlaced with each other. This intermodel interlacing property gives a new relationship between the closeness of LSFs and their spectral sensitivities, which enables us to propose a weighting function for LSF distortion measurement. By applying the proposed weighting function to an LSF quantizer, we can achieve better performance than when using the conventional heuristic functions. Moreover, the complexity of the proposed weighting function is much lower than that of the optimal weighting function, while their performances are almost the same.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128905615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients 基于频率倒谱系数矢量量化的语音带宽扩展
N. Enbom, W. Kleijn
Telephone speech is usually limited to less than 4 kHz in bandwidth. This bandwidth limitation results in the typical sound of telephone speech. We present a new method of regenerating the high frequencies (4-8 kHz) based on vector quantization of the mel-frequency cepstral coefficients (MFCC). We also present two methods to avoid perceptually annoying overestimates of the signal power in the high-band. Listening tests show the benefits of the new procedures. Use of MFCC for vector quantization instead of traditionally used spectral representations improves the quality of the speech significantly. Tests also show that the wide-band speech reconstructed with the method is significantly more pleasant to the human ear than the original narrowband speech.
电话语音的带宽通常限制在4千赫以下。这种带宽限制导致了典型的电话语音。提出了一种基于mel-frequency倒谱系数(MFCC)矢量量化的高频(4- 8khz)再生方法。我们还提出了两种方法来避免高频段信号功率的感知上令人讨厌的高估。听力测试显示了新程序的好处。使用MFCC进行矢量量化,而不是传统的频谱表示,显著提高了语音质量。实验还表明,用该方法重构的宽频带语音比原窄频带语音对人耳的感觉要好得多。
{"title":"Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients","authors":"N. Enbom, W. Kleijn","doi":"10.1109/SCFT.1999.781521","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781521","url":null,"abstract":"Telephone speech is usually limited to less than 4 kHz in bandwidth. This bandwidth limitation results in the typical sound of telephone speech. We present a new method of regenerating the high frequencies (4-8 kHz) based on vector quantization of the mel-frequency cepstral coefficients (MFCC). We also present two methods to avoid perceptually annoying overestimates of the signal power in the high-band. Listening tests show the benefits of the new procedures. Use of MFCC for vector quantization instead of traditionally used spectral representations improves the quality of the speech significantly. Tests also show that the wide-band speech reconstructed with the method is significantly more pleasant to the human ear than the original narrowband speech.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122239209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Speech enhancement and coding in harsh acoustic noise environments 恶劣噪声环境下的语音增强与编码
J. Collura
Recent advances in speech enhancement and noise pre-processing algorithms have dramatically improved the quality and intelligibility of speech signals, both in the presence of acoustic noise as well as in benign environments. The use of speech enhancement in combination with voice coding algorithms and applied to governmental wireless communications systems is an important application area. This paper will first introduce one such system, the 2.4 kbps US Government Military Standard mixed excitation linear prediction (MELP) speech coding algorithm coupled with a speech enhancement algorithm developed by AT&T Research Labs. Next, the paper presents a discussion of the test conditions and results, and provide an interpretation of these results. Finally, a general discussion of related issues and conclusions is presented.
语音增强和噪声预处理算法的最新进展极大地提高了语音信号的质量和可理解性,无论是在存在噪声的环境中还是在良性环境中。将语音增强技术与语音编码算法相结合应用于政府无线通信系统是一个重要的应用领域。本文将首先介绍一个这样的系统,即2.4 kbps的美国政府军事标准混合激励线性预测(MELP)语音编码算法与AT&T研究实验室开发的语音增强算法相结合。接下来,本文对试验条件和结果进行了讨论,并对这些结果进行了解释。最后,对相关问题进行了一般性讨论并得出结论。
{"title":"Speech enhancement and coding in harsh acoustic noise environments","authors":"J. Collura","doi":"10.1109/SCFT.1999.781518","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781518","url":null,"abstract":"Recent advances in speech enhancement and noise pre-processing algorithms have dramatically improved the quality and intelligibility of speech signals, both in the presence of acoustic noise as well as in benign environments. The use of speech enhancement in combination with voice coding algorithms and applied to governmental wireless communications systems is an important application area. This paper will first introduce one such system, the 2.4 kbps US Government Military Standard mixed excitation linear prediction (MELP) speech coding algorithm coupled with a speech enhancement algorithm developed by AT&T Research Labs. Next, the paper presents a discussion of the test conditions and results, and provide an interpretation of these results. Finally, a general discussion of related issues and conclusions is presented.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121288541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
5-kHz-bandwidth speech coder at 4-8 kbit/s 4- 8kbit /s的5khz带宽语音编码器
N. Harada, H. Ohmuro
We propose a 5-kHz-bandwidth CELP speech coder for various multimedia applications. For portability, the coder has three bit rate modes: MODE8 (7.8 kbit/s), MODE6 (5.75 kbit/s) and MODE4 (3.95 kbit/s). The bit rate mode can be switched frame by frame. In order to achieve both low bit rate and naturalness, 5-kHz-bandwidth speech signals are used instead of 3.4 kHz or 7-kHz-bandwidth signals. The speech signals under consideration are band-limited to 5 kHz and are sampled at 11.025 kHz. Subjective listening tests indicated that the 5-kHz-bandwidth is effective for low-bit-rate speech coders. The mean opinion score and comparative mean opinion score showed that the quality of this coder in MODE8 (5 kHz, 7.8 kbit/s) is better than that of the G.729 (3.4 kHz, 8 kbit/s), G.722 (7 kHz, 48 kbit/s), and equivalent to that of G.729 Annex E (3.4 kHz, 11.8 kbit/s). In addition, at MODE6 (5 kHz, 5.75 kbit/s), the quality of this coder is better than that of G.723.1 (3.4 kHz, 6.3 kbit/s), and equivalent to G.729 (3.4 kHz, and kbit/s). We also determine the relationship among characterizations of subjective quality, bandwidth and noisiness.
我们提出了一种适用于各种多媒体应用的5 khz带宽CELP语音编码器。为了便于携带,编码器有三种比特率模式:MODE8 (7.8 kbit/s), MODE6 (5.75 kbit/s)和MODE4 (3.95 kbit/s)。比特率模式可以逐帧切换。为了实现低比特率和自然度,使用5 kHz带宽的语音信号代替3.4 kHz或7 kHz带宽的信号。所考虑的语音信号带宽限制为5 kHz,采样频率为11.025 kHz。主观听力测试表明,5khz带宽对于低比特率语音编码器是有效的。平均意见评分和比较平均意见评分表明,该编码器在MODE8 (5 kHz, 7.8 kbit/s)下的质量优于G.729 (3.4 kHz, 8 kbit/s), G.722 (7 kHz, 48 kbit/s),与G.729附件E (3.4 kHz, 11.8 kbit/s)相当。此外,在MODE6 (5 kHz, 5.75 kbit/s)下,该编码器的质量优于G.723.1 (3.4 kHz, 6.3 kbit/s),与G.729 (3.4 kHz, kbit/s)相当。我们还确定了主观质量、带宽和噪声特征之间的关系。
{"title":"5-kHz-bandwidth speech coder at 4-8 kbit/s","authors":"N. Harada, H. Ohmuro","doi":"10.1109/SCFT.1999.781468","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781468","url":null,"abstract":"We propose a 5-kHz-bandwidth CELP speech coder for various multimedia applications. For portability, the coder has three bit rate modes: MODE8 (7.8 kbit/s), MODE6 (5.75 kbit/s) and MODE4 (3.95 kbit/s). The bit rate mode can be switched frame by frame. In order to achieve both low bit rate and naturalness, 5-kHz-bandwidth speech signals are used instead of 3.4 kHz or 7-kHz-bandwidth signals. The speech signals under consideration are band-limited to 5 kHz and are sampled at 11.025 kHz. Subjective listening tests indicated that the 5-kHz-bandwidth is effective for low-bit-rate speech coders. The mean opinion score and comparative mean opinion score showed that the quality of this coder in MODE8 (5 kHz, 7.8 kbit/s) is better than that of the G.729 (3.4 kHz, 8 kbit/s), G.722 (7 kHz, 48 kbit/s), and equivalent to that of G.729 Annex E (3.4 kHz, 11.8 kbit/s). In addition, at MODE6 (5 kHz, 5.75 kbit/s), the quality of this coder is better than that of G.723.1 (3.4 kHz, 6.3 kbit/s), and equivalent to G.729 (3.4 kHz, and kbit/s). We also determine the relationship among characterizations of subjective quality, bandwidth and noisiness.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128721081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Voice over IP systems with speech bitrate adaptation based on MPEG-4 wideband CELP 基于MPEG-4宽带CELP的语音比特率自适应IP话音系统
T. Nomura, M. Iwadare
This paper proposes bitrate adaptation schemes for real-time voice over IP applications, based on multirate and scalable coding capabilities in MPEG-4 CELP. For IP telephony applications, the adaptation scheme based on multi-rate coding is utilized to achieve the best coding quality at a given bitrate. For broadcast applications, the scalable CELP coder produces a layered bitstream and the bitrate control at IP routers determines the number of bitstream layers depending on the network throughputs. Performance evaluation of bitrate adaptation using the MPEG-4 wideband CELP coder is presented.
本文提出了基于MPEG-4 CELP中多速率和可扩展编码能力的IP实时语音比特率自适应方案。对于IP电话应用,采用基于多速率编码的自适应方案,在给定比特率下实现最佳编码质量。对于广播应用,可扩展的CELP编码器产生分层的比特流,IP路由器的比特率控制根据网络吞吐量决定比特流层的数量。给出了使用MPEG-4宽带CELP编码器进行比特率自适应的性能评价。
{"title":"Voice over IP systems with speech bitrate adaptation based on MPEG-4 wideband CELP","authors":"T. Nomura, M. Iwadare","doi":"10.1109/SCFT.1999.781508","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781508","url":null,"abstract":"This paper proposes bitrate adaptation schemes for real-time voice over IP applications, based on multirate and scalable coding capabilities in MPEG-4 CELP. For IP telephony applications, the adaptation scheme based on multi-rate coding is utilized to achieve the best coding quality at a given bitrate. For broadcast applications, the scalable CELP coder produces a layered bitstream and the bitrate control at IP routers determines the number of bitstream layers depending on the network throughputs. Performance evaluation of bitrate adaptation using the MPEG-4 wideband CELP coder is presented.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116914666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The use of LSF-based phonetic classification in low-rate coder design 基于lsf的语音分类在低速率编码器设计中的应用
J. J. Parry, I. Burnett, J. Chicharo
In this paper we investigate a novel approach to low-bit rate quantisation of the spectral parameters of speech. This approach incorporates phonetic information into the structure of line spectral frequency (LSF) codebooks. As clear relationships exist between phonetic segments and LSFs, phonetic events can be expressed in terms of the structure of an LSF codebook and the successive vectors chosen by it. The investigation leads to the conclusion that the structure of LSF codebooks can be usefully employed in phonetic classification as a front end to multi-modal phonetic vocoding.
本文研究了一种语音频谱参数的低比特率量化方法。该方法将语音信息整合到线谱频率(LSF)码本结构中。由于语音片段和LSF之间存在明确的关系,因此语音事件可以用LSF码本的结构及其选择的连续向量来表示。研究结果表明,LSF码本结构可以作为多模态语音编码的前端,有效地用于语音分类。
{"title":"The use of LSF-based phonetic classification in low-rate coder design","authors":"J. J. Parry, I. Burnett, J. Chicharo","doi":"10.1109/SCFT.1999.781480","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781480","url":null,"abstract":"In this paper we investigate a novel approach to low-bit rate quantisation of the spectral parameters of speech. This approach incorporates phonetic information into the structure of line spectral frequency (LSF) codebooks. As clear relationships exist between phonetic segments and LSFs, phonetic events can be expressed in terms of the structure of an LSF codebook and the successive vectors chosen by it. The investigation leads to the conclusion that the structure of LSF codebooks can be usefully employed in phonetic classification as a front end to multi-modal phonetic vocoding.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114818019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Trellis coded quantization 网格编码量化
T. Fischer
Trellis coded quantization (TCQ) is an efficient form of multidimensional quantization that achieves portions of the possible point density, space filling, and granular gains promised by vector quantization. For memoryless sources, the combination of TCQ and a suitable entropy code can provide performance within 0.5 dB of the rate-distortion limit.
网格编码量化(TCQ)是多维量化的一种有效形式,它可以实现向量量化所承诺的部分可能的点密度、空间填充和颗粒增益。对于无记忆源,TCQ和合适的熵码的组合可以提供在速率失真极限0.5 dB以内的性能。
{"title":"Trellis coded quantization","authors":"T. Fischer","doi":"10.1109/SCFT.1999.781487","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781487","url":null,"abstract":"Trellis coded quantization (TCQ) is an efficient form of multidimensional quantization that achieves portions of the possible point density, space filling, and granular gains promised by vector quantization. For memoryless sources, the combination of TCQ and a suitable entropy code can provide performance within 0.5 dB of the rate-distortion limit.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121001060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1