首页 > 最新文献

1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)最新文献

英文 中文
Speaker adaptation in a phonetic vocoding environment 语音编码环境下的说话人适应
C. Ribeiro, I. Trancoso
The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding a speaker adaptation methodology is adopted, although resulting in higher bit rates to transmit the speaker specific information. In order to further reduce the corresponding bit rate, a new method is proposed that explores the intra-speaker correlation for the same phone.
本文提出的编码器属于分段声码器,即语音声码器。考虑到需要减少说话人的特定信息,在最低比特率下,说话人的识别是声编码器面临的主要问题之一。因此,语音声编码器非常适合于依赖于说话人的编码,并且可以实现低至250比特/秒的比特率。对于独立于说话人的编码,采用了说话人自适应方法,尽管会产生更高的比特率来传输说话人特定的信息。为了进一步降低相应的比特率,提出了一种新的方法来探索同一电话的说话人之间的相关性。
{"title":"Speaker adaptation in a phonetic vocoding environment","authors":"C. Ribeiro, I. Trancoso","doi":"10.1109/SCFT.1999.781485","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781485","url":null,"abstract":"The coder proposed in this paper falls in the class of segmental vocoders known as phonetic vocoders. Speaker recognisability is one of the main problems faced by vocoders at the lowest bit rates, given the need to reduce speaker specific information. Hence, phonetic vocoders are very suitable to speaker dependent coding, and can achieve bit rates as low as 250 bit/s. For speaker independent coding a speaker adaptation methodology is adopted, although resulting in higher bit rates to transmit the speaker specific information. In order to further reduce the corresponding bit rate, a new method is proposed that explores the intra-speaker correlation for the same phone.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125165146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Coding the line spectral frequencies by jointly optimized MA prediction and vector quantization 采用联合优化的MA预测和矢量量化对线谱频率进行编码
Y. Shoham
This paper presents a method for designing and optimizing predictive vector quantizers (PVQ) for coding the line spectral frequencies (LSF) in LPC-based speech and audio coders. The algorithm is based on iterative optimization of the predictors and the vector-quantizer codebooks. It is shown that the proposed method yields high quality LSF predictive quantizers with performance exceeding that of the PVQ used in the G.729 standard.
本文提出了一种设计和优化预测矢量量化器(PVQ)的方法,用于基于lpc的语音和音频编码器的线谱频率(LSF)编码。该算法基于预测器和矢量量化码本的迭代优化。结果表明,该方法产生了高质量的LSF预测量化器,其性能优于G.729标准中使用的PVQ。
{"title":"Coding the line spectral frequencies by jointly optimized MA prediction and vector quantization","authors":"Y. Shoham","doi":"10.1109/SCFT.1999.781479","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781479","url":null,"abstract":"This paper presents a method for designing and optimizing predictive vector quantizers (PVQ) for coding the line spectral frequencies (LSF) in LPC-based speech and audio coders. The algorithm is based on iterative optimization of the predictors and the vector-quantizer codebooks. It is shown that the proposed method yields high quality LSF predictive quantizers with performance exceeding that of the PVQ used in the G.729 standard.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116835661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reverse water-filling in predictive encoding of speech 语音预测编码中的反向充水
S. Andersen, W. Kleijn
Reverse water-filling suggests that, at low bit rates, the synthesis filter for predictive encoding should differ from the model filter of the signal to be encoded. However, reverse water-filling follows from optimum encoding and stationary Gaussian assumptions. By means of simple experiments, we show that reverse water-filling applies to predictive encoding of speech. For a vector analysis-by-synthesis encoding based on a first order autoregressive signal model, the use of a synthesis filter derived from reverse water-filling resulted in consistently improved segmental SNR measures.
反向注水表明,在低比特率下,用于预测编码的合成滤波器应该不同于待编码信号的模型滤波器。然而,反充水遵循最优编码和平稳高斯假设。通过简单的实验,我们证明了反向充水算法适用于语音预测编码。对于基于一阶自回归信号模型的矢量合成分析编码,使用由反向注水导出的合成滤波器可以持续提高分段信噪比。
{"title":"Reverse water-filling in predictive encoding of speech","authors":"S. Andersen, W. Kleijn","doi":"10.1109/SCFT.1999.781499","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781499","url":null,"abstract":"Reverse water-filling suggests that, at low bit rates, the synthesis filter for predictive encoding should differ from the model filter of the signal to be encoded. However, reverse water-filling follows from optimum encoding and stationary Gaussian assumptions. By means of simple experiments, we show that reverse water-filling applies to predictive encoding of speech. For a vector analysis-by-synthesis encoding based on a first order autoregressive signal model, the use of a synthesis filter derived from reverse water-filling resulted in consistently improved segmental SNR measures.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Adapting waveform interpolation (with pitch-spaced subbands) for quantisation 采用波形插值(带间距子带)进行量化
N. R. Chong, I. Burnett, J. Chicharo
Adaptation of the waveform interpolation (WI) paradigm to allow waveform coding of the speech signals was reported by Kleijn et al. (see Proc. 5/sup th/ Int. Conf. Spoken Language Processing, Sydney, Australia, Dec. 1998). However, since the signal is time-warped to a constant pitch, processing the surface derived from the new technique is extremely dependent on having an accurate pitch track. In order to facilitate vector quantisation techniques, it is necessary to manipulate the pitch track to ensure phase-alignment of critically sampled pitch periods. In addition, pitch cycles following unvoiced segments must also carry the same phase offset. The adjusted pitch track is used to facilitate a re-warping of the residual signal. The effects of warping and pitch inaccuracies on the transformed result of the warped periods are also discussed.
Kleijn等人报道了波形插值(WI)范式的适应性,以允许语音信号的波形编码(见Proc. 5/sup / Int)。会议:口语处理,悉尼,澳大利亚,1998年12月。然而,由于信号被时间扭曲成一个恒定的音高,处理由新技术衍生的表面非常依赖于具有准确的音高轨迹。为了便于矢量量化技术,有必要对基音轨迹进行操纵,以确保临界采样的基音周期的相位对准。此外,非浊音段之后的音高周期也必须具有相同的相位偏移。调整后的音高轨道用于促进残余信号的重新翘曲。讨论了翘曲和节距误差对翘曲周期变换结果的影响。
{"title":"Adapting waveform interpolation (with pitch-spaced subbands) for quantisation","authors":"N. R. Chong, I. Burnett, J. Chicharo","doi":"10.1109/SCFT.1999.781496","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781496","url":null,"abstract":"Adaptation of the waveform interpolation (WI) paradigm to allow waveform coding of the speech signals was reported by Kleijn et al. (see Proc. 5/sup th/ Int. Conf. Spoken Language Processing, Sydney, Australia, Dec. 1998). However, since the signal is time-warped to a constant pitch, processing the surface derived from the new technique is extremely dependent on having an accurate pitch track. In order to facilitate vector quantisation techniques, it is necessary to manipulate the pitch track to ensure phase-alignment of critically sampled pitch periods. In addition, pitch cycles following unvoiced segments must also carry the same phase offset. The adjusted pitch track is used to facilitate a re-warping of the residual signal. The effects of warping and pitch inaccuracies on the transformed result of the warped periods are also discussed.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129397668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Robust speech transmission over noisy channels employing non-linear block codes 采用非线性分组码的噪声信道上的鲁棒语音传输
S. Heinen, S. Bleck, P. Vary
In medium to low bit rate speech codecs the speech signal is represented by a set of parameters. The most important concept is presently code excited linear predictive (CELP) coding. A speech segment of typically 10 to 20 ms is described in terms of prediction coefficients, gain factors and excitation vectors. Due to the high compression rates (0.5-1.5 bits per speech sample) the parameters are partly highly sensitive against channel noise. In this paper we present a new error protection technique, that is based on a joint optimization of parameter quantization and a redundant non-linear block coding scheme. For parameter reconstruction, the principle of soft bit source decoding is applied. The proposed technique can be used in combination with conventional error protection such as convolutional coding and allows a flexible subdivision of the gross data rate for source coding and error protection.
在中低比特率语音编解码器中,语音信号由一组参数表示。目前最重要的概念是码激励线性预测(CELP)编码。典型的10 ~ 20ms的语音片段用预测系数、增益因子和激励向量来描述。由于高压缩率(每个语音样本0.5-1.5比特),这些参数对信道噪声部分高度敏感。本文提出了一种基于参数量化和冗余非线性分组编码联合优化的错误保护技术。在参数重构方面,采用了软位码译码原理。所提出的技术可以与传统的错误保护(如卷积编码)结合使用,并允许对源编码和错误保护的总数据率进行灵活的细分。
{"title":"Robust speech transmission over noisy channels employing non-linear block codes","authors":"S. Heinen, S. Bleck, P. Vary","doi":"10.1109/SCFT.1999.781488","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781488","url":null,"abstract":"In medium to low bit rate speech codecs the speech signal is represented by a set of parameters. The most important concept is presently code excited linear predictive (CELP) coding. A speech segment of typically 10 to 20 ms is described in terms of prediction coefficients, gain factors and excitation vectors. Due to the high compression rates (0.5-1.5 bits per speech sample) the parameters are partly highly sensitive against channel noise. In this paper we present a new error protection technique, that is based on a joint optimization of parameter quantization and a redundant non-linear block coding scheme. For parameter reconstruction, the principle of soft bit source decoding is applied. The proposed technique can be used in combination with conventional error protection such as convolutional coding and allows a flexible subdivision of the gross data rate for source coding and error protection.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122045932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fast spherical code decoding algorithms for the residual codebook in CELP coders CELP编码器中剩余码本的快速球形码解码算法
K. Koppinen, T. Mikkonen
A new method for fast decoding when using algebraic codes for the fixed codebook in CELP speech coders is presented. This method is based on the trellis structure of a block code, and allows fast optimal search of the residual codebook even with a combined scalar gain, unlike previous search methods. The method is flexible, allowing for long block lengths and the use of any code including nonlinear ones. Currently the performance is not as high as with standard algebraic coding methods, but further refinements may make this a viable method.
针对CELP语音编码器的固定码本,提出了一种使用代数码进行快速译码的新方法。该方法基于块码的网格结构,与以往的搜索方法不同,即使在合并标量增益的情况下,也可以快速优化剩余码本。该方法是灵活的,允许长块长度和使用任何代码,包括非线性代码。目前的性能不如标准代数编码方法高,但进一步的改进可能使其成为一种可行的方法。
{"title":"Fast spherical code decoding algorithms for the residual codebook in CELP coders","authors":"K. Koppinen, T. Mikkonen","doi":"10.1109/SCFT.1999.781501","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781501","url":null,"abstract":"A new method for fast decoding when using algebraic codes for the fixed codebook in CELP speech coders is presented. This method is based on the trellis structure of a block code, and allows fast optimal search of the residual codebook even with a combined scalar gain, unlike previous search methods. The method is flexible, allowing for long block lengths and the use of any code including nonlinear ones. Currently the performance is not as high as with standard algebraic coding methods, but further refinements may make this a viable method.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129471156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An adaptive post-filtering technique based on a least squares approach 基于最小二乘法的自适应后滤波技术
A. Mustapha, S. Yeldener
This paper presents an adaptive time-domain post-filtering technique based on the least squares approach and modified Yule-Walker (MYW) filter. Conventionally, post-filtering is derived from an original LPC spectrum. In general, this time-domain technique produces unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse and high pass filtering and hence introduces muffling in the speech quality. Other approaches of designing post-filters were developed in the frequency domain which can only be used in sinusoidal based speech coders. We have also developed a new time-domain post-filtering technique which eliminates the problem of spectral tilt in the speech spectrum and can be applied to various speech coders. The new post-filter has a flat frequency response at the formant peaks of the speech spectrum. This post-filtering technique has been used in a 4 kb/s harmonic excitation linear predictive coder (HE-LPC) and subjective listening tests have indicated that this technique outperforms the conventional one in both one and two tandem connections.
提出了一种基于最小二乘法和改进Yule-Walker (MYW)滤波的自适应时域后滤波技术。通常,后滤波是由原始LPC频谱推导出来的。一般来说,这种时域技术会产生不可预测的频谱倾斜,难以通过改进的LPC合成、逆滤波和高通滤波来控制,从而导致语音质量的消声。其他设计后滤波器的方法是在频域开发的,只能用于基于正弦的语音编码器。我们还开发了一种新的时域后滤波技术,该技术消除了语音频谱中的频谱倾斜问题,可应用于各种语音编码器。新的后置滤波器在语音频谱的形成峰处具有平坦的频率响应。该后滤波技术已用于4 kb/s谐波激励线性预测编码器(HE-LPC),主观聆听测试表明,该技术在单串联和双串联连接中都优于传统的后滤波技术。
{"title":"An adaptive post-filtering technique based on a least squares approach","authors":"A. Mustapha, S. Yeldener","doi":"10.1109/SCFT.1999.781516","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781516","url":null,"abstract":"This paper presents an adaptive time-domain post-filtering technique based on the least squares approach and modified Yule-Walker (MYW) filter. Conventionally, post-filtering is derived from an original LPC spectrum. In general, this time-domain technique produces unpredictable spectral tilt that is hard to control by the modified LPC synthesis, inverse and high pass filtering and hence introduces muffling in the speech quality. Other approaches of designing post-filters were developed in the frequency domain which can only be used in sinusoidal based speech coders. We have also developed a new time-domain post-filtering technique which eliminates the problem of spectral tilt in the speech spectrum and can be applied to various speech coders. The new post-filter has a flat frequency response at the formant peaks of the speech spectrum. This post-filtering technique has been used in a 4 kb/s harmonic excitation linear predictive coder (HE-LPC) and subjective listening tests have indicated that this technique outperforms the conventional one in both one and two tandem connections.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121421104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Post noise smoother to improve low bit rate speech-coding performance 后置噪声平滑,提高低比特率语音编码性能
H. Tasaki, S. Takahashi
A new post-process called a post noise smoother (PNS) for the CELP decoder is proposed in order to improve low bit rate speech-coding performance under various background noise conditions. In the PNS, spectral amplitude smoothing and phase randomizing are performed on the decoded speech in order to obtain smoothed background noise. The decoded speech, the smoothed signal, and an automatically generated imitative noise signal are multiplied by adaptive gains and are summed up in the final output speech. These gains are computed from each frame's estimated ratio of background noise to signal. Evaluation test results show that the PNS significantly improves the subjective quality of a 4-kbps speech coder under various conditions of background noise.
为了提高低比特率语音编码在各种背景噪声条件下的性能,提出了一种新的后置处理方法——后置噪声平滑(PNS)。在PNS中,对解码后的语音进行频谱幅度平滑和相位随机化,以获得平滑的背景噪声。解码后的语音、平滑后的信号和自动生成的模仿噪声信号乘以自适应增益,并在最终的输出语音中求和。这些增益是根据每帧的背景噪声与信号的估计比率计算出来的。评估测试结果表明,在各种背景噪声条件下,PNS显著提高了4kbps语音编码器的主观质量。
{"title":"Post noise smoother to improve low bit rate speech-coding performance","authors":"H. Tasaki, S. Takahashi","doi":"10.1109/SCFT.1999.781517","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781517","url":null,"abstract":"A new post-process called a post noise smoother (PNS) for the CELP decoder is proposed in order to improve low bit rate speech-coding performance under various background noise conditions. In the PNS, spectral amplitude smoothing and phase randomizing are performed on the decoded speech in order to obtain smoothed background noise. The decoded speech, the smoothed signal, and an automatically generated imitative noise signal are multiplied by adaptive gains and are summed up in the final output speech. These gains are computed from each frame's estimated ratio of background noise to signal. Evaluation test results show that the PNS significantly improves the subjective quality of a 4-kbps speech coder under various conditions of background noise.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114894902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A multimode transform predictive coder (MTPC) for speech and audio 用于语音和音频的多模变换预测编码器(MTPC)
S. Ramprashad
Speech and audio coding are often considered to be two separate technologies, each almost independently developing different techniques for signal compression. At low bit rates the gap in performance between the two technologies begins to be noticeable; speech coders work better on speech and audio coders perform better on music. The challenge is to merge the two technologies into a single coding paradigm which will work as well as either two regardless of the input signal. Presented is a multimode speech and audio coder which can adapt almost continuously between a speech and audio coding mode. This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictive coder (TPC).
语音和音频编码通常被认为是两种独立的技术,它们几乎各自独立地开发了不同的信号压缩技术。在低比特率下,两种技术之间的性能差距开始明显;语音编码器在语音方面表现更好,音频编码器在音乐方面表现更好。挑战在于将这两种技术合并到一个单一的编码范式中,无论输入信号如何,这两种技术都可以正常工作。提出了一种可以在语音和音频编码模式之间几乎连续适应的多模语音音频编码器。与单模变换预测编码器(TPC)相比,这种多模变换预测编码器(MTPC)在语音和音频输入上都表现出更好的性能。
{"title":"A multimode transform predictive coder (MTPC) for speech and audio","authors":"S. Ramprashad","doi":"10.1109/SCFT.1999.781467","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781467","url":null,"abstract":"Speech and audio coding are often considered to be two separate technologies, each almost independently developing different techniques for signal compression. At low bit rates the gap in performance between the two technologies begins to be noticeable; speech coders work better on speech and audio coders perform better on music. The challenge is to merge the two technologies into a single coding paradigm which will work as well as either two regardless of the input signal. Presented is a multimode speech and audio coder which can adapt almost continuously between a speech and audio coding mode. This multimode transform predictive coder (MTPC) shows improved performance on both speech and audio inputs when compared to a single-mode transform predictive coder (TPC).","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122715632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A computational model for MOS prediction MOS预测的计算模型
Doh-Suk Kim, O. Ghitza, P. Kroon
A computational model to predict MOS (mean opinion score) of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes our attempt to derive a "universal", database-independent, distortion-to-MOS mapping function. Preliminary experimental evaluation shows that the performance of the proposed system is comparable with ITU-T recommendation P.861 for clean speech sources, and outperforms the P.861 recommendation for speech sources corrupted by either car or babble noise at 30 dB SNR.
提出了一种预测处理后语音的平均意见评分的计算模型。该系统使用哺乳动物听觉系统的外围模型和心理物理学启发的测量方法来测量处理后语音的失真(与源语音相比),并将失真值映射到MOS量表上。本文描述了我们试图推导一个“通用的”、数据库无关的、扭曲到mos的映射函数。初步的实验评估表明,该系统的性能可与ITU-T推荐的P.861相媲美,并且在信噪比为30 dB的情况下,优于P.861推荐的受汽车或杂音干扰的语音源。
{"title":"A computational model for MOS prediction","authors":"Doh-Suk Kim, O. Ghitza, P. Kroon","doi":"10.1109/SCFT.1999.781511","DOIUrl":"https://doi.org/10.1109/SCFT.1999.781511","url":null,"abstract":"A computational model to predict MOS (mean opinion score) of processed speech is proposed. The system measures the distortion of processed speech (compared to the source speech) using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure, and maps the distortion value onto the MOS scale. This paper describes our attempt to derive a \"universal\", database-independent, distortion-to-MOS mapping function. Preliminary experimental evaluation shows that the performance of the proposed system is comparable with ITU-T recommendation P.861 for clean speech sources, and outperforms the P.861 recommendation for speech sources corrupted by either car or babble noise at 30 dB SNR.","PeriodicalId":372569,"journal":{"name":"1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116560635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1