首页 > 最新文献

5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文 中文
Formant diphone parameter extraction utilising a labelled single-speaker database 利用标记的单扬声器数据库提取峰峰diphone参数
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-36
R. Mannell
This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formant-parameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker- specific extraction of formant frequencies. An analysis-by-synthesis procedure is then used to provide best-matching formant intensity and bandwidth parameters. The great majority of the parameters so extracted produce speech which is highly intelligible and which has a voice quality close to the original speaker. Synthesis techniques based upon LPC-parameter or waveform concatenation are much less vulnerable to the effects of poorly extracted parameters. The formant model is, however, more straightforwardly related to the source-filter model and thus to speech production. Whilst it is true that overlap-add concatenation of waveform-based diphones can easily model a voice with quite high fidelity, new voices and voice qualities require the recording of new speakers (or the same speaker utilising a different voice quality) and the extraction of a new diphone database. Such systems can be used to examine the effects of intonation and rhythm on voice quality or vocal affect but formant-based systems can much more readily examine the effect of frequency-domain modifications on voice quality. Such modifications might include formant frequency shifting, bandwidth modification, modification of relative formant intensities and spectral slope variation. It is even possible, if the synthesiser design allows it, to experiment with the insertion of additional poles and zeroes into the spectrum such as might occur when modelling the "singer's formant" for certain styles of singing voice. Such research requires a parallel formant synthesiser with a great deal of flexibility of control. Further, and most importantly, it requires a diphone database that is extremely accurate. Formant errors must be minor and few in number and this should be achieved without excessive hand correction. Formant tracks should display, as far as possible, pole continuity across fricatives, stops and affricates. Extracted intensities and
本文研究了一种从标记的单说话者数据库中提取共振峰参数的方法,用于共振峰参数diphonesconcatation语音合成系统。该程序从标记数据库的初始构象分析开始,然后用于获得每个音素的构象(F1-F5)概率空间。这些概率空间指导更仔细地提取特定于说话人的共振峰频率。然后使用合成分析程序来提供最匹配的形成峰强度和带宽参数。绝大多数提取的参数产生的语音具有高度可理解性,语音质量接近原始说话人。基于lpc参数或波形串联的合成技术更不容易受到提取参数差的影响。然而,共振峰模型更直接地与源-滤波器模型相关,因此与语音产生相关。虽然基于波形的diphone的重叠添加连接确实可以很容易地以相当高保真度建模声音,但新的声音和声音质量需要记录新的扬声器(或使用不同声音质量的同一扬声器)和提取新的diphone数据库。这样的系统可以用来检查语调和节奏对语音质量或声音影响的影响,但基于共振峰的系统可以更容易地检查频域修改对语音质量的影响。这些修改可能包括形成峰频移、带宽修改、相对形成峰强度的修改和频谱斜率的变化。如果合成器的设计允许的话,甚至有可能在频谱中插入额外的极点和零点,就像在为某些风格的歌唱声音建模“歌手的共振峰”时可能发生的那样。这样的研究需要一个并行的形成峰合成器,具有很大的控制灵活性。此外,最重要的是,它需要一个极其准确的diphone数据库。共振峰误差必须小,数量少,这应该在没有过多的手工校正的情况下实现。形成音轨应尽可能地显示出摩擦音、停顿音和非重叠音之间的极点连续性。提取强度和
{"title":"Formant diphone parameter extraction utilising a labelled single-speaker database","authors":"R. Mannell","doi":"10.21437/ICSLP.1998-36","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-36","url":null,"abstract":"This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formant-parameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker- specific extraction of formant frequencies. An analysis-by-synthesis procedure is then used to provide best-matching formant intensity and bandwidth parameters. The great majority of the parameters so extracted produce speech which is highly intelligible and which has a voice quality close to the original speaker. Synthesis techniques based upon LPC-parameter or waveform concatenation are much less vulnerable to the effects of poorly extracted parameters. The formant model is, however, more straightforwardly related to the source-filter model and thus to speech production. Whilst it is true that overlap-add concatenation of waveform-based diphones can easily model a voice with quite high fidelity, new voices and voice qualities require the recording of new speakers (or the same speaker utilising a different voice quality) and the extraction of a new diphone database. Such systems can be used to examine the effects of intonation and rhythm on voice quality or vocal affect but formant-based systems can much more readily examine the effect of frequency-domain modifications on voice quality. Such modifications might include formant frequency shifting, bandwidth modification, modification of relative formant intensities and spectral slope variation. It is even possible, if the synthesiser design allows it, to experiment with the insertion of additional poles and zeroes into the spectrum such as might occur when modelling the \"singer's formant\" for certain styles of singing voice. Such research requires a parallel formant synthesiser with a great deal of flexibility of control. Further, and most importantly, it requires a diphone database that is extremely accurate. Formant errors must be minor and few in number and this should be achieved without excessive hand correction. Formant tracks should display, as far as possible, pole continuity across fricatives, stops and affricates. Extracted intensities and","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115730884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese 日语声门手势重叠模式与元音消音的对应关系
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-798
M. Fujimoto, E. Murano, S. Niimi, S. Kiritani
Correspondence between the glottal opening gesture pattern and vowel devoicing in Japanese was examined using PGG with special reference to the pattern of glottal gesture overlap and blending into the neighboring vowel. The results showed that most of the tokens demonstrated either a single glottal opening pattern with a devoiced vowel, or a double glottal opening with a voiced vowel during /CiC/ sequences as generally expected. Some tokens, however, showed a double glottal opening with a devoiced vowel, or a single glottal opening with a partially voiced vowel. From the viewpoint of gestural overlap analysis of vowel devoicing, an intermediate process of gestural overlap may explain the occurrence of the case in which the vowel was devoiced and showed a double phase opening. Nevertheless, the presence of a partially voiced vowel with a single opening phase clearly shows the complexity of vowel devoicing in Japanese, since there are possibly two different patterns of glottal opening (single phase and double phase), which could be observed in PGG analysis, in utterances with partially voiced vowels.
用PGG分析了日语中声门打开的手势模式与元音发声的对应关系,并特别参考了声门手势重叠和与相邻元音融合的模式。结果表明,在/CiC/序列中,大多数标记要么表现为单声门开启模式,要么表现为双声门开启模式,并出现一个浊音。然而,一些标记显示双声门开口带有一个元音,或者单声门开口带有一个部分元音。从元音放音的手势重叠分析来看,一个手势重叠的中间过程可以解释元音放音和双相位开口的发生。尽管如此,在日语中出现单开口的部分发声元音清楚地表明了元音发声的复杂性,因为在部分发声元音的话语中,可能存在两种不同的声门打开模式(单相和双相),这可以在PGG分析中观察到。
{"title":"Correspondence between the glottal gesture overlap pattern and vowel devoicing in Japanese","authors":"M. Fujimoto, E. Murano, S. Niimi, S. Kiritani","doi":"10.21437/ICSLP.1998-798","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-798","url":null,"abstract":"Correspondence between the glottal opening gesture pattern and vowel devoicing in Japanese was examined using PGG with special reference to the pattern of glottal gesture overlap and blending into the neighboring vowel. The results showed that most of the tokens demonstrated either a single glottal opening pattern with a devoiced vowel, or a double glottal opening with a voiced vowel during /CiC/ sequences as generally expected. Some tokens, however, showed a double glottal opening with a devoiced vowel, or a single glottal opening with a partially voiced vowel. From the viewpoint of gestural overlap analysis of vowel devoicing, an intermediate process of gestural overlap may explain the occurrence of the case in which the vowel was devoiced and showed a double phase opening. Nevertheless, the presence of a partially voiced vowel with a single opening phase clearly shows the complexity of vowel devoicing in Japanese, since there are possibly two different patterns of glottal opening (single phase and double phase), which could be observed in PGG analysis, in utterances with partially voiced vowels.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115758662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance and optimization of the SEEVOC algorithm SEEVOC算法的性能与优化
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-379
Weihua Zhang, W. Holmes
In most low bit rate coders, the quality of the synthetic speech depends greatly on the performance of the spectral coding stage, in which the spectral envelope is estimated and encoded. The Spectral Envelope Estimation Vocoder (SEEVOC) is a successful spectral envelope estimation method that plays an important role in low bit rate speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, and shows that it can be generalized and optimized by changing the search range parameters a and b . Rules for the optimum choice of a and b are derived, based on both analysis and experimental results. The effects of noise on the SEEVOC algorithm are also investigated. Experimental results show that the SEEVOC algorithm performs better for voiced speech in the presence of noise than linear prediction (LP) analysis.
在大多数低比特率编码器中,合成语音的质量很大程度上取决于频谱编码阶段的性能,在这个阶段中,频谱包络线被估计和编码。频谱包络估计声码器(SEEVOC)是一种成功的基于正弦模型的频谱包络估计方法,在低比特率语音编码中发挥了重要作用。本文研究了SEEVOC算法的特性和局限性,通过改变搜索范围参数a和b,可以对SEEVOC算法进行推广和优化。根据分析和实验结果,导出了a和b的最优选择规则。研究了噪声对SEEVOC算法的影响。实验结果表明,SEEVOC算法对存在噪声的浊音语音比线性预测(LP)分析效果更好。
{"title":"Performance and optimization of the SEEVOC algorithm","authors":"Weihua Zhang, W. Holmes","doi":"10.21437/ICSLP.1998-379","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-379","url":null,"abstract":"In most low bit rate coders, the quality of the synthetic speech depends greatly on the performance of the spectral coding stage, in which the spectral envelope is estimated and encoded. The Spectral Envelope Estimation Vocoder (SEEVOC) is a successful spectral envelope estimation method that plays an important role in low bit rate speech coding based on the sinusoidal model. This paper investigates the properties and limitations of the SEEVOC algorithm, and shows that it can be generalized and optimized by changing the search range parameters a and b . Rules for the optimum choice of a and b are derived, based on both analysis and experimental results. The effects of noise on the SEEVOC algorithm are also investigated. Experimental results show that the SEEVOC algorithm performs better for voiced speech in the presence of noise than linear prediction (LP) analysis.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124195706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A context-dependent approach for speaker verification using sequential decision 基于上下文的顺序决策说话人验证方法
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-229
H. Noda, Katsuya Harada, E. Kawaguchi, H. Sawai
This paper is concerned about speaker veri ca tion SV using the sequential probability ratio test SPRT In the SPRT input samples are usually as sumed to be i i d samples from a probability density function because an on line probability computation is required Feature vectors used in speech processing obviously do not satisfy the assumption and there fore the correlation between successive feature vectors has not been considered in conventional SV using the SPRT The correlation can be modeled by the hidden Markov model HMM but unfortunately the HMM can not be directly applied to the SPRT because of statistical dependence of input samples This paper proposes a method of HMM probability computation using the mean eld approximation to resolve this problem where the probability of whole input samples is nominally represented as the product of probability of each sample as if input samples were independent each other
本文关注演讲者真实ca为SV使用序贯概率比检验SPRT SPRT输入样本通常是外界以为我d样本概率密度函数,因为需要一个联机概率计算特征向量用于语音处理显然不满足假设,因此连续特征向量之间的相关性并没有被认为是在传统SV使用SPRT相关性可以被建模隐马尔可夫模型HMM,但不幸的是,由于输入样本的统计依赖性,隐马尔可夫模型不能直接应用于SPRT。本文提出了一种利用平均场近似计算HMM概率的方法来解决这一问题,该方法将整个输入样本的概率名义上表示为每个样本的概率的乘积,就好像输入样本是相互独立的一样
{"title":"A context-dependent approach for speaker verification using sequential decision","authors":"H. Noda, Katsuya Harada, E. Kawaguchi, H. Sawai","doi":"10.21437/ICSLP.1998-229","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-229","url":null,"abstract":"This paper is concerned about speaker veri ca tion SV using the sequential probability ratio test SPRT In the SPRT input samples are usually as sumed to be i i d samples from a probability density function because an on line probability computation is required Feature vectors used in speech processing obviously do not satisfy the assumption and there fore the correlation between successive feature vectors has not been considered in conventional SV using the SPRT The correlation can be modeled by the hidden Markov model HMM but unfortunately the HMM can not be directly applied to the SPRT because of statistical dependence of input samples This paper proposes a method of HMM probability computation using the mean eld approximation to resolve this problem where the probability of whole input samples is nominally represented as the product of probability of each sample as if input samples were independent each other","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Can we hear smile? 我们能听到微笑吗?
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-106
M. Schröder, V. Aubergé, Marie-Agnès Cathiard
The amusement expression is both visual and audible in speech. After recording comparable spontaneous, acted, mechanical, reiterated and seduction stimuli, five perceptual experiments were held, mainly based on the hypothesis of prosodically controlled effects of amusement on speech. Results show that audio is partially independant from video, which is as performant as audio-video. Spontaneous speech (unvolontary controlled) can be identified in front of acted speech (volontary controlled). Amusement speech can be distinguished from seduction speech.
语言中的娱乐表达是视觉和听觉两方面的。在记录了自发的、表演的、机械的、重复的和诱惑的刺激之后,进行了五个感知实验,主要基于娱乐对语言的韵律控制效应的假设。结果表明,音频部分独立于视频,具有与音视频相同的性能。自发言语(非自愿控制)可以在行为言语(自愿控制)之前被识别。娱乐演讲可以与诱惑演讲区分开来。
{"title":"Can we hear smile?","authors":"M. Schröder, V. Aubergé, Marie-Agnès Cathiard","doi":"10.21437/ICSLP.1998-106","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-106","url":null,"abstract":"The amusement expression is both visual and audible in speech. After recording comparable spontaneous, acted, mechanical, reiterated and seduction stimuli, five perceptual experiments were held, mainly based on the hypothesis of prosodically controlled effects of amusement on speech. Results show that audio is partially independant from video, which is as performant as audio-video. Spontaneous speech (unvolontary controlled) can be identified in front of acted speech (volontary controlled). Amusement speech can be distinguished from seduction speech.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115009879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Articulability of two consecutive morae in Japanese speech production: evidence from sound exchange errors in spontaneous speech 日语语音产生中两个连续音节的发音能力:来自自发语音中声音交换错误的证据
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-809
Y. Terao, Tadao Murata
In the present study. we would like to discuss how the articulability of two consecutive morae plays an important role in speech production. Our assumption is based on the analysis of Japanese sound exchange error data which are collected from the spontaneous speech ofadults and infants. Three esperiments were also carried out to confirm the reality of a unit of two consecutive morae. Phonological/Phonetic characteristics lvere shown through the results of experiments and related observations.
在目前的研究中。我们将讨论两个连续的音节的发音能力如何在语音生成中发挥重要作用。我们的假设是基于对日语语音交换误差数据的分析,这些数据收集自成人和婴儿的自发语言。还进行了三个实验,以确认一个单位的两个连续morae的真实性。通过实验和相关观察的结果显示语音特征。
{"title":"Articulability of two consecutive morae in Japanese speech production: evidence from sound exchange errors in spontaneous speech","authors":"Y. Terao, Tadao Murata","doi":"10.21437/ICSLP.1998-809","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-809","url":null,"abstract":"In the present study. we would like to discuss how the articulability of two consecutive morae plays an important role in speech production. Our assumption is based on the analysis of Japanese sound exchange error data which are collected from the spontaneous speech ofadults and infants. Three esperiments were also carried out to confirm the reality of a unit of two consecutive morae. Phonological/Phonetic characteristics lvere shown through the results of experiments and related observations.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114388458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition from GSM digital speech GSM数字语音识别
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-324
A. Gallardo-Antolín, F. Díaz-de-María, F. J. Valverde-Albacete
This paper addresses the problem of speech recognition in the GSM environment. In this context, new sources of distortion, such as transmission errors or speech coding itself, significantly degrade the performance of speech recognizers. While conventional approaches deal with these types of distortion after decoding speech, we propose to recognize from the digital speech representation of GSM. In particular, our work focuses on the 13 kbit/s RPE-LTP GSM standard speech coder. In order to test our recognizer we have compared it to a conventional recognizer in several simulated situations, which allow us to gain insight into more practical ones. Specifically, besides recognizing from clean digital speech and evaluating the influence of speech coding distortion, the proposed recognizer is faced with speech degraded by random errors, burst errors and frame substitutions. The results are very encouraging: the worse the transmission conditions are, the more recognizing from digital speech outperforms the conventional approach.
本文研究了GSM环境下的语音识别问题。在这种情况下,新的失真源,如传输错误或语音编码本身,会显著降低语音识别器的性能。虽然传统的方法处理解码语音后的这些类型的失真,但我们建议从GSM的数字语音表示中识别。特别是,我们的工作重点是13kbit /s RPE-LTP GSM标准语音编码器。为了测试我们的识别器,我们在几个模拟情况下将其与传统识别器进行了比较,这使我们能够深入了解更实际的情况。具体来说,除了对干净的数字语音进行识别和评估语音编码失真的影响外,该识别器还面临随机错误、突发错误和帧替换导致的语音退化问题。结果令人鼓舞:在传输条件较差的情况下,数字语音识别的效果优于传统方法。
{"title":"Recognition from GSM digital speech","authors":"A. Gallardo-Antolín, F. Díaz-de-María, F. J. Valverde-Albacete","doi":"10.21437/ICSLP.1998-324","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-324","url":null,"abstract":"This paper addresses the problem of speech recognition in the GSM environment. In this context, new sources of distortion, such as transmission errors or speech coding itself, significantly degrade the performance of speech recognizers. While conventional approaches deal with these types of distortion after decoding speech, we propose to recognize from the digital speech representation of GSM. In particular, our work focuses on the 13 kbit/s RPE-LTP GSM standard speech coder. In order to test our recognizer we have compared it to a conventional recognizer in several simulated situations, which allow us to gain insight into more practical ones. Specifically, besides recognizing from clean digital speech and evaluating the influence of speech coding distortion, the proposed recognizer is faced with speech degraded by random errors, burst errors and frame substitutions. The results are very encouraging: the worse the transmission conditions are, the more recognizing from digital speech outperforms the conventional approach.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114871001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Perceived prominence and acoustic parameters in american English 美式英语的感知突出和声学参数
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-133
T. Portele
This paper describes the relationships between perceived prominence as a gradual value and some acoustic-prosodic parameters. Prominence is used as an intermediate parameter in a speech synthesis system. A corpus of American English utterances was constructed by measuring and annotating various linguistic, acoustic and perceptual parameters and features. The investigation of the corpus revealed some strong and some rather weak relations between prominence and acoustic-prosodic parameters that serve as a starting point for the development of prominence-based rules for the synthesis of American English prosody in a content-to-speech system.
本文描述了作为渐变值的感知突出与一些声学韵律参数之间的关系。在语音合成系统中,突出度被用作中间参数。通过测量和标注各种语言、声学和感知参数和特征,构建了一个美国英语话语语料库。对语料库的调查揭示了突出音和声学韵律参数之间的一些强而弱的关系,这些关系可以作为在内容到语音系统中开发基于突出音的美国英语韵律合成规则的起点。
{"title":"Perceived prominence and acoustic parameters in american English","authors":"T. Portele","doi":"10.21437/ICSLP.1998-133","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-133","url":null,"abstract":"This paper describes the relationships between perceived prominence as a gradual value and some acoustic-prosodic parameters. Prominence is used as an intermediate parameter in a speech synthesis system. A corpus of American English utterances was constructed by measuring and annotating various linguistic, acoustic and perceptual parameters and features. The investigation of the corpus revealed some strong and some rather weak relations between prominence and acoustic-prosodic parameters that serve as a starting point for the development of prominence-based rules for the synthesis of American English prosody in a content-to-speech system.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114924488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Data-driven extensions to HMM statistical dependencies 数据驱动的HMM统计依赖关系扩展
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-166
J. Bilmes
In this paper, a new technique is introduced that relaxes the HMM conditional independence assumption in a principled way. Without increasing the number of states, the modeling power of an HMM is increased by including only those additional probabilistic dependencies (to the surrounding observation context) that are believed to be both relevant and discriminative. Conditional mutual information is used to determine both relevance and discriminability. Extended Gaussian-mixture HMMs and new EM update equations are introduced. In an isolated word speech database, results show an average 34% word error improvement over an HMM with the same number of states, and a 15% improvement over an HMM with a comparable number of parameters.
本文提出了一种原则性地放宽HMM条件独立性假设的新方法。在不增加状态数量的情况下,HMM的建模能力通过只包括那些被认为既相关又有区别的额外概率依赖项(对周围观察上下文)来提高。条件互信息用于确定相关性和可辨别性。介绍了扩展的高斯混合hmm和新的EM更新方程。在一个孤立的单词语音数据库中,结果显示,与具有相同状态数的HMM相比,平均单词错误率提高了34%,与具有相同参数数的HMM相比,平均错误率提高了15%。
{"title":"Data-driven extensions to HMM statistical dependencies","authors":"J. Bilmes","doi":"10.21437/ICSLP.1998-166","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-166","url":null,"abstract":"In this paper, a new technique is introduced that relaxes the HMM conditional independence assumption in a principled way. Without increasing the number of states, the modeling power of an HMM is increased by including only those additional probabilistic dependencies (to the surrounding observation context) that are believed to be both relevant and discriminative. Conditional mutual information is used to determine both relevance and discriminability. Extended Gaussian-mixture HMMs and new EM update equations are introduced. In an isolated word speech database, results show an average 34% word error improvement over an HMM with the same number of states, and a 15% improvement over an HMM with a comparable number of parameters.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115052185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Acoustic-articulatory evaluation of the upper vowel-formant region and its presumed speaker-specific potency 上元音形成峰区域的声学-发音评估及其假定的说话人特异性效力
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-359
F. Clermont, P. Mokhtari
We present some evidence indicating that phonetic distinctiveness and speaker individuality, are indeed manifested in vowels’ vocal-tract shapes estimated from the lower and the upper formant-frequencies, respec-tively. The methodology developed to demonstrate this dichotomy, (cid:12)rst implicates Schroeder’s [8] acous-tic-articulatory model which can be coerced to yield, on a per-vowel and a per-speaker basis, area-function approximations to vocal-tract shapes of di(cid:11)ering formant components. Using ten steady-state vowels recorded in /hVd/-context, (cid:12)ve times at random, by four adult-male speakers of Australian English, the variability of result-ing shapes aligned at mid-length was then measured on an intra- and an inter-speaker basis. Gross shapes estimated from the lower formants, were indeed found to cause the largest spread amongst the vowels of individual speakers. By contrast, the more detailed shapes obtained by recruiting certain higher formants of the front and the back vowels, accounted for the largest spread amongst the speakers. Collectively, these results contribute a quasi-articulatory substantiation of a long-standing view on the speaker-speci(cid:12)c potency of the upper formant region of spoken vowels, together with some useful implications for automatic speech and speaker recognition.
我们提出的一些证据表明,语音的独特性和说话人的个性确实表现在元音的声道形状中,分别是从低共振峰频率和高共振峰频率估计出来的。为了证明这种二分法而开发的方法(cid:12)首先暗示了Schroeder的b[8]语音-发音模型,该模型可以强制产生,在每个元音和每个说话者的基础上,di(cid:11)发声组成部分的声道形状的面积函数近似。使用随机记录在/hVd/-上下文中(cid:12) 5次的10个稳定状态元音,然后在说话者内部和说话者之间测量结果中长度排列的形状的可变性。从较低的共振峰估计的粗略形状,确实被发现在单个说话者的元音之间造成了最大的传播。相比之下,通过吸收某些前元音和后元音的高共振峰而获得的更详细的形状,在说话者之间的传播最大。总的来说,这些结果为长期以来关于说话人-特定(cid:12)语音上峰区域的c效力的观点提供了准发音证实,同时也为自动语音和说话人识别提供了一些有用的启示。
{"title":"Acoustic-articulatory evaluation of the upper vowel-formant region and its presumed speaker-specific potency","authors":"F. Clermont, P. Mokhtari","doi":"10.21437/ICSLP.1998-359","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-359","url":null,"abstract":"We present some evidence indicating that phonetic distinctiveness and speaker individuality, are indeed manifested in vowels’ vocal-tract shapes estimated from the lower and the upper formant-frequencies, respec-tively. The methodology developed to demonstrate this dichotomy, (cid:12)rst implicates Schroeder’s [8] acous-tic-articulatory model which can be coerced to yield, on a per-vowel and a per-speaker basis, area-function approximations to vocal-tract shapes of di(cid:11)ering formant components. Using ten steady-state vowels recorded in /hVd/-context, (cid:12)ve times at random, by four adult-male speakers of Australian English, the variability of result-ing shapes aligned at mid-length was then measured on an intra- and an inter-speaker basis. Gross shapes estimated from the lower formants, were indeed found to cause the largest spread amongst the vowels of individual speakers. By contrast, the more detailed shapes obtained by recruiting certain higher formants of the front and the back vowels, accounted for the largest spread amongst the speakers. Collectively, these results contribute a quasi-articulatory substantiation of a long-standing view on the speaker-speci(cid:12)c potency of the upper formant region of spoken vowels, together with some useful implications for automatic speech and speaker recognition.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115107646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
5th International Conference on Spoken Language Processing (ICSLP 1998)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1