Modeling spectral speech transitions using temporal decomposition techniques

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing Pub Date : 1987-04-06 DOI:10.1109/ICASSP.1987.1169742

G. Ahlbom, F. Bimbot, G. Chollet

{"title":"Modeling spectral speech transitions using temporal decomposition techniques","authors":"G. Ahlbom, F. Bimbot, G. Chollet","doi":"10.1109/ICASSP.1987.1169742","DOIUrl":null,"url":null,"abstract":"ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios{y_{i}= \\Ln ((1+k_{i})/(1-k_{i}))}where kiare the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments (\"polysons\") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.","PeriodicalId":140810,"journal":{"name":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1987-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1987.1169742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

ATAL [1] introduced a technique for decomposing speech into phone-length temporal events in terms of overlapping and interacting articulatory gestures. This paper reports on simplifications of this technique with applications to acoustic-phonetic synthesis. Spectral evolution is represented by time-indexed trajectories in the p-dimensional space of Log-Area Ratios{y_{i}= \Ln ((1+k_{i})/(1-k_{i}))}where kiare the reflection coefficients obtained from short-time stationary LPC analysis. The vocal tract configuration (spectral vector) associated with each interpolation function belongs to a finite set of articulatory targets (vector quantization code book). A set of speech segments ("polysons") has been encoded using this technique. It includes diphones, demi-syllables, and other units that are difficult to segment. Temporal decomposition using target spectra can break the complex encoding of these segments. In particular, coarticulation effects are analyticaiy explained and modeled. It is demonstrated that these new tools provide an adequate environment in our search for better rules in acoustic speech synthesis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用时间分解技术建模频谱语音转换

ATAL[1]引入了一种技术，根据重叠和相互作用的发音手势，将语音分解为电话长度的时间事件。本文报道了该技术的简化及其在声音合成中的应用。光谱演化由p维空间中Log-Area ratio (y_{i}= \Ln ((1+k_{i})/(1-k_{i}))}的时间索引轨迹表示，其中ki为短时平稳LPC分析得到的反射系数。声道结构(谱矢量)与每个插值函数相关联，属于有限的发音目标集合(矢量量化代码书)。一组语音片段(“多义词”)已经使用这种技术进行了编码。它包括双音、半音节和其他难以分割的单位。利用目标光谱进行时间分解可以打破这些片段的复杂编码。特别是，协同衔接效应的分析解释和建模。结果表明，这些新工具为我们寻找更好的声学语音合成规则提供了充分的环境。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing

自引率

0.00%

发文量

期刊最新文献

A high resolution data-adaptive time-frequency representation A fast prediction-error detector for estimating sparse-spike sequences Some applications of mathematical morphology to range imagery Parameter estimation using the autocorrelation of the discrete Fourier transform Array signal processing with interconnected Neuron-like elements