匹配追求正弦语音编码

Ç. Etemoglu, V. Cuperman
{"title":"匹配追求正弦语音编码","authors":"Ç. Etemoglu, V. Cuperman","doi":"10.1109/TSA.2003.815520","DOIUrl":null,"url":null,"abstract":"This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"6 1","pages":"413-424"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Matching pursuits sinusoidal speech coding\",\"authors\":\"Ç. Etemoglu, V. Cuperman\",\"doi\":\"10.1109/TSA.2003.815520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.\",\"PeriodicalId\":13155,\"journal\":{\"name\":\"IEEE Trans. Speech Audio Process.\",\"volume\":\"6 1\",\"pages\":\"413-424\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Trans. Speech Audio Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TSA.2003.815520\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.815520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

本文介绍了一种用于低比特率语音编码的正弦建模技术,其中每个正弦分量的参数通过闭环分析顺序提取。语音线性预测(LP)残差的正弦建模是在用正弦波字典匹配追踪的一般框架内进行的。正弦波的频率空间被限制为一组频率间隔或箱,这与闭环分析相结合,使我们能够将正弦波的频率映射为有效量化的频率向量。在浊音帧中,产生两组频率向量,其中一组表示浊音段的谐波相关分量,另一组表示非谐波相关分量。这种方法消除了难以正确估计和在低比特率下量化的语音相关截止频率的需要。在过渡帧中,为了有效地提取和量化低频残差正弦表示所需的频率集,我们引入了频率本向量量化(FBVQ)。FBVQ从频率码本中选择一个频率间隔不均匀的向量来表示过渡区域的频域信息。我们将FBVQ与闭环搜索相结合,有助于提高过渡帧的语音质量。利用听觉感知的临界频带概念来定义频率箱,提高了编码方案的有效性。为了证明新模型的可行性和优势,我们设计了一个4kbps匹配追踪正弦语音编码器。主观测试结果表明,4kbps编码器的质量优于6.3 kbps的G.723.1编码器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Matching pursuits sinusoidal speech coding
This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Errata to "Using Steady-State Suppression to Improve Speech Intelligibility in Reverberant Environments for Elderly Listeners" Farewell Editorial Inaugural Editorial: Riding the Tidal Wave of Human-Centric Information Processing - Innovate, Outreach, Collaborate, Connect, Expand, and Win Three-Dimensional Sound Field Reproduction Using Multiple Circular Loudspeaker Arrays Introduction to the Special Issue on Processing Reverberant Speech: Methodologies and Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1