Evaluation of Efficient Compression Properties of the Complete Oscillator Method, Part 2: Speech Coding

Anton Y. Yen, I. Gorodnitsky
{"title":"Evaluation of Efficient Compression Properties of the Complete Oscillator Method, Part 2: Speech Coding","authors":"Anton Y. Yen, I. Gorodnitsky","doi":"10.1109/DCC.2013.110","DOIUrl":null,"url":null,"abstract":"Summary form only given. This paper examines the performance of the recently proposed Complete Oscillator Method (COM) in the context of coding speech. The COM is shown to provide several advantages over traditional predictive coding techniques. Unlike the cascaded method employed by codecs such as Adaptive Multi-Rate (AMR), the COM encodes short and long-term data features jointly using a single, flexible representation. Joint approaches have previously been shown to yield efficiency gains [1]. Furthermore, the COM does not always require an explicit encoding of the residual error to reconstruct the signal. As AMR can allocate as much as 85% of its coding budget towards encoding the residual, there is substantial motivation for finding alternatives to source-filter coding methods. The first part of the paper compares the synthesis of speech frames using the COM versus a combination of linear predictor and adaptive codebook (LPAC) in order to assess the deterministic modeling capabilities of the COM relative to linear predictive codes. With both approaches optimized by minimizing the perceptually-weighted error (PWE) between the original and reconstructed speech, the COM is shown to achieve lower PWE on average than LPAC as implemented in the AMR standard for several types of speech. The COM improved PWE in 78.20% of voiced frames yielding a 2.02 dB PWE gain on average. For voiced to unvoiced transitions, the COM improved PWE in 76.75% of the frames with a 1.26 dB average gain. For unvoiced speech, the COM consistently improved PWE but the average gain was not significant. Only for unvoiced to voiced transitions did the COM not produce gains in average PWE. The second part of the paper compares the synthesis of speech frames using the COM at several bit rates to standard AMR and Speex codecs to show that the COM can produce comparable quality speech in a significant percentage of frames. Using weighted spectral slope distance (WSS) as a metric, a 5.5 kbps COM was seen to outperform 12.2 kbps AMR in 24.12% of speech frames. These results are not intended to demonstrate the workings of a COM-only speech coder, but rather to suggest how existing codecs can achieve lower bit rates by using the COM to encode some subset of frames. For example, by using the COM in the lowest bit rate mode sufficient to achieve a similar WSS as 12.2 kbps AMR, the average bit rate can potentially be reduced to 9.16 kbps.","PeriodicalId":388717,"journal":{"name":"2013 Data Compression Conference","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2013.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Summary form only given. This paper examines the performance of the recently proposed Complete Oscillator Method (COM) in the context of coding speech. The COM is shown to provide several advantages over traditional predictive coding techniques. Unlike the cascaded method employed by codecs such as Adaptive Multi-Rate (AMR), the COM encodes short and long-term data features jointly using a single, flexible representation. Joint approaches have previously been shown to yield efficiency gains [1]. Furthermore, the COM does not always require an explicit encoding of the residual error to reconstruct the signal. As AMR can allocate as much as 85% of its coding budget towards encoding the residual, there is substantial motivation for finding alternatives to source-filter coding methods. The first part of the paper compares the synthesis of speech frames using the COM versus a combination of linear predictor and adaptive codebook (LPAC) in order to assess the deterministic modeling capabilities of the COM relative to linear predictive codes. With both approaches optimized by minimizing the perceptually-weighted error (PWE) between the original and reconstructed speech, the COM is shown to achieve lower PWE on average than LPAC as implemented in the AMR standard for several types of speech. The COM improved PWE in 78.20% of voiced frames yielding a 2.02 dB PWE gain on average. For voiced to unvoiced transitions, the COM improved PWE in 76.75% of the frames with a 1.26 dB average gain. For unvoiced speech, the COM consistently improved PWE but the average gain was not significant. Only for unvoiced to voiced transitions did the COM not produce gains in average PWE. The second part of the paper compares the synthesis of speech frames using the COM at several bit rates to standard AMR and Speex codecs to show that the COM can produce comparable quality speech in a significant percentage of frames. Using weighted spectral slope distance (WSS) as a metric, a 5.5 kbps COM was seen to outperform 12.2 kbps AMR in 24.12% of speech frames. These results are not intended to demonstrate the workings of a COM-only speech coder, but rather to suggest how existing codecs can achieve lower bit rates by using the COM to encode some subset of frames. For example, by using the COM in the lowest bit rate mode sufficient to achieve a similar WSS as 12.2 kbps AMR, the average bit rate can potentially be reduced to 9.16 kbps.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
完整振荡器方法的有效压缩特性评估,第2部分:语音编码
只提供摘要形式。本文研究了最近提出的完全振荡器方法(COM)在语音编码中的性能。与传统的预测编码技术相比,COM具有许多优点。与自适应多速率(AMR)等编解码器采用的级联方法不同,COM使用单一、灵活的表示方式联合编码短期和长期数据特征。联合方法先前已被证明可以提高效率[1]。此外,COM并不总是需要对残差进行显式编码来重建信号。由于AMR可以将高达85%的编码预算分配给残差编码,因此寻找源滤波器编码方法的替代方法具有很大的动机。本文的第一部分比较了使用COM的语音帧合成与线性预测器和自适应码本(LPAC)的组合,以评估COM相对于线性预测码的确定性建模能力。两种方法都通过最小化原始语音和重建语音之间的感知加权误差(PWE)进行了优化,结果表明,对于几种类型的语音,COM的平均PWE比AMR标准中实现的LPAC要低。COM提高了78.20%的浊音帧的PWE,平均PWE增益为2.02 dB。对于浊音到非浊音转换,COM在76.75%的帧中提高了PWE,平均增益为1.26 dB。对于不发音的语音,COM持续提高PWE,但平均增益并不显著。只有在不发音到发音的转换中,COM不会产生平均PWE的增益。论文的第二部分比较了使用COM在几个比特率下与标准AMR和Speex编解码器合成语音帧的情况,以表明COM可以在很大比例的帧中产生相当质量的语音。使用加权频谱斜率距离(WSS)作为度量,在24.12%的语音帧中,5.5 kbps的COM表现优于12.2 kbps的AMR。这些结果并不是为了演示纯COM语音编码器的工作原理,而是建议现有的编解码器如何通过使用COM对帧的某些子集进行编码来实现更低的比特率。例如,通过在最低比特率模式下使用COM,足以实现与12.2 kbps AMR相似的WSS,平均比特率可能会降低到9.16 kbps。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries Low Bit-Rate Subpixel-Based Color Image Compression Robust Adaptive Image Coding for Frame Memory Reduction in LCD Overdrive A Scalable Video Coding Extension of HEVC Low Complexity Embedded Quantization Scheme Compatible with Bitplane Image Coding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1