Sinusoidal speech coding at 2.4 kbps using an improved phase matching algorithm

Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136) Pub Date : 1997-11-02 DOI:10.1109/ACSSC.1997.679071

S. Ahmadi, A.S. Spenias

{"title":"Sinusoidal speech coding at 2.4 kbps using an improved phase matching algorithm","authors":"S. Ahmadi, A.S. Spenias","doi":"10.1109/ACSSC.1997.679071","DOIUrl":null,"url":null,"abstract":"This paper addresses the design, development, evaluation, and implementation of efficient low bit rate speech coding algorithms based on the sinusoidal model. A series of algorithms have been developed for pitch frequency determination and voicing detection, simultaneous modeling of the sinusoidal amplitudes and phases, and mid-frame interpolation. An improved sinusoidal phase matching algorithm is presented, where short-time sinusoidal phases are approximated using an elaborate combination of linear prediction, spectral sampling, delay compensation, and phase correction techniques. A voicing-dependent perceptual split vector quantization scheme is used to encode the sinusoidal amplitudes. The perceptual properties of the human auditory system are effectively exploited in the developed algorithms. The algorithms have been successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder has been evaluated in terms of subjective tests such as the mean opinion score and the diagnostic rhyme test, as well as some perceptually-motivated objective distortion measures. Performance analysis on a large speech database indicates that the use of the proposed algorithms resulted in considerable improvement in temporal and spectral signal matching, as well as improved subjective quality of the reproduced speech.","PeriodicalId":240431,"journal":{"name":"Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.1997.679071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper addresses the design, development, evaluation, and implementation of efficient low bit rate speech coding algorithms based on the sinusoidal model. A series of algorithms have been developed for pitch frequency determination and voicing detection, simultaneous modeling of the sinusoidal amplitudes and phases, and mid-frame interpolation. An improved sinusoidal phase matching algorithm is presented, where short-time sinusoidal phases are approximated using an elaborate combination of linear prediction, spectral sampling, delay compensation, and phase correction techniques. A voicing-dependent perceptual split vector quantization scheme is used to encode the sinusoidal amplitudes. The perceptual properties of the human auditory system are effectively exploited in the developed algorithms. The algorithms have been successfully integrated into a 2.4 kbps sinusoidal coder. The performance of the 2.4 kbps coder has been evaluated in terms of subjective tests such as the mean opinion score and the diagnostic rhyme test, as well as some perceptually-motivated objective distortion measures. Performance analysis on a large speech database indicates that the use of the proposed algorithms resulted in considerable improvement in temporal and spectral signal matching, as well as improved subjective quality of the reproduced speech.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用改进的相位匹配算法以2.4 kbps的速度进行正弦语音编码

本文讨论了基于正弦模型的高效低比特率语音编码算法的设计、开发、评估和实现。已经开发了一系列的算法来确定基音频率和语音检测，正弦振幅和相位的同步建模，以及中间帧插值。提出了一种改进的正弦相位匹配算法，其中使用线性预测，频谱采样，延迟补偿和相位校正技术的精心组合来近似短时间正弦相位。采用与语音相关的感知分割矢量量化方案对正弦振幅进行编码。在开发的算法中，有效地利用了人类听觉系统的感知特性。该算法已成功集成到一个2.4 kbps的正弦编码器中。对2.4 kbps编码器的性能进行了主观测试，如平均意见得分和诊断韵律测试，以及一些感知动机的客观失真测量。在一个大型语音数据库上的性能分析表明，使用所提出的算法在时间和频谱信号匹配方面取得了相当大的改善，并且提高了再现语音的主观质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊