一种基于增强混合激励线性预测的变比特率语音编码算法

2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) Pub Date : 2016-10-01 DOI:10.1109/CISP-BMEI.2016.7852841

Ye Li, Qiuyun Hao, P. Zhang, Jingsai Jiang, Xiaofeng Ma, Yanhong Fan, H. V. Davydau

{"title":"一种基于增强混合激励线性预测的变比特率语音编码算法","authors":"Ye Li, Qiuyun Hao, P. Zhang, Jingsai Jiang, Xiaofeng Ma, Yanhong Fan, H. V. Davydau","doi":"10.1109/CISP-BMEI.2016.7852841","DOIUrl":null,"url":null,"abstract":"In order to improve the channel bandwidth utilization of voice communication, a variable bit rate speech coding algorithm based on enhanced mixed excitation linear prediction (MELPe) is proposed in the paper. In voice communication, only about 40% of the time is occupied by talking, whereas the rest is engaged by silence or background noise. In addition, unvoiced frame usually requires less transmission rate than the voiced one in low bit rate speech coding algorithms. Therefore, always using the same coding bit rate for speech coding is a waste of channel resource. In this paper, the input signal is divided into speech and silence by using voice activity detection (VAD) technology. And the speech frames are divided into voiced frame or unvoiced frame. They use different coding rates for speech coding and data transmission. All of the parameters are encoded, transmitted and decoded in voiced frame. Only gain parameters, LSF parameters, pitch parameters and overall voicing are encoded, transmitted and decoded in the unvoiced frame. Furthermore, only the gain parameters and the first level LSF parameters are encoded, transmitted and decoded in the silence frame. When about 40% of the time is occupied by talking, compare with the traditional 2.4 kbps MELPe vocoder, the average coding rate of the proposed variable bit rate vocoder can reach 1.33 kbps. But they can achieve the same quality of synthetic speech. Experimental results show that the proposed method reduces the average coding rate, and the synthetic background noise has good comfort on the subjective sense of hearing.","PeriodicalId":275095,"journal":{"name":"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A variable-bit-rate speech coding algorithm based on enhanced mixed excitation linear prediction\",\"authors\":\"Ye Li, Qiuyun Hao, P. Zhang, Jingsai Jiang, Xiaofeng Ma, Yanhong Fan, H. V. Davydau\",\"doi\":\"10.1109/CISP-BMEI.2016.7852841\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to improve the channel bandwidth utilization of voice communication, a variable bit rate speech coding algorithm based on enhanced mixed excitation linear prediction (MELPe) is proposed in the paper. In voice communication, only about 40% of the time is occupied by talking, whereas the rest is engaged by silence or background noise. In addition, unvoiced frame usually requires less transmission rate than the voiced one in low bit rate speech coding algorithms. Therefore, always using the same coding bit rate for speech coding is a waste of channel resource. In this paper, the input signal is divided into speech and silence by using voice activity detection (VAD) technology. And the speech frames are divided into voiced frame or unvoiced frame. They use different coding rates for speech coding and data transmission. All of the parameters are encoded, transmitted and decoded in voiced frame. Only gain parameters, LSF parameters, pitch parameters and overall voicing are encoded, transmitted and decoded in the unvoiced frame. Furthermore, only the gain parameters and the first level LSF parameters are encoded, transmitted and decoded in the silence frame. When about 40% of the time is occupied by talking, compare with the traditional 2.4 kbps MELPe vocoder, the average coding rate of the proposed variable bit rate vocoder can reach 1.33 kbps. But they can achieve the same quality of synthetic speech. Experimental results show that the proposed method reduces the average coding rate, and the synthetic background noise has good comfort on the subjective sense of hearing.\",\"PeriodicalId\":275095,\"journal\":{\"name\":\"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISP-BMEI.2016.7852841\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI.2016.7852841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

为了提高语音通信的信道带宽利用率，提出了一种基于增强混合激励线性预测(MELPe)的变比特率语音编码算法。在语音交流中，只有大约40%的时间是在说话，而其余的时间都被沉默或背景噪音所占据。此外，在低比特率语音编码算法中，非浊音帧通常比浊音帧需要更低的传输速率。因此，总是使用相同的编码码率进行语音编码是对信道资源的浪费。本文采用语音活动检测(VAD)技术，将输入信号分为语音信号和静音信号。语音帧分为浊音帧和非浊音帧。它们使用不同的编码速率进行语音编码和数据传输。所有的参数都在浊音帧中进行编码、传输和解码。在非浊音帧中，只有增益参数、LSF参数、音高参数和整体发声进行编码、传输和解码。在静默帧中，只有增益参数和一级LSF参数被编码、传输和解码。当通话占用约40%的时间时，与传统的2.4 kbps MELPe声码器相比，本文提出的可变比特率声码器的平均编码速率可以达到1.33 kbps。但它们可以达到与合成语音相同的质量。实验结果表明，该方法降低了平均编码率，合成背景噪声对主观听觉有较好的舒适性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A variable-bit-rate speech coding algorithm based on enhanced mixed excitation linear prediction

In order to improve the channel bandwidth utilization of voice communication, a variable bit rate speech coding algorithm based on enhanced mixed excitation linear prediction (MELPe) is proposed in the paper. In voice communication, only about 40% of the time is occupied by talking, whereas the rest is engaged by silence or background noise. In addition, unvoiced frame usually requires less transmission rate than the voiced one in low bit rate speech coding algorithms. Therefore, always using the same coding bit rate for speech coding is a waste of channel resource. In this paper, the input signal is divided into speech and silence by using voice activity detection (VAD) technology. And the speech frames are divided into voiced frame or unvoiced frame. They use different coding rates for speech coding and data transmission. All of the parameters are encoded, transmitted and decoded in voiced frame. Only gain parameters, LSF parameters, pitch parameters and overall voicing are encoded, transmitted and decoded in the unvoiced frame. Furthermore, only the gain parameters and the first level LSF parameters are encoded, transmitted and decoded in the silence frame. When about 40% of the time is occupied by talking, compare with the traditional 2.4 kbps MELPe vocoder, the average coding rate of the proposed variable bit rate vocoder can reach 1.33 kbps. But they can achieve the same quality of synthetic speech. Experimental results show that the proposed method reduces the average coding rate, and the synthetic background noise has good comfort on the subjective sense of hearing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)

自引率

0.00%

发文量

期刊最新文献

D-admissible control of singular delta operator systems Performance comparison of two spread-spectrum-based wireless video transmission schemes Impact analysis on three-dimensional indoor location technology Formation of graphene oxide/graphene membrane on solid-state substrates via Langmuir-Blodgett self-assembly Design of a panorama parking system based on DM6437