首页 > 最新文献

5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文 中文
Waveform interpolation coding with pitch-spaced subbands 带间距子带的波形插值编码
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-382
W. Kleijn, Huimin Yang, E. Deprettere
We present new waveform-interpolation coding procedures which allow perfect reconstruction of the speech signal from the unquantized parameter set. Instead of using adaptive parameter extraction methods, we combine a time warping of the original signal with nonadaptive parameter extraction methods. The new coding structure has good performance at low bit rates and provides convergence to the original waveform with increasing rate.
我们提出了一种新的波形插值编码方法,可以从未量化的参数集中完美地重建语音信号。我们将原始信号的时间规整与非自适应参数提取方法相结合,而不是使用自适应参数提取方法。新的编码结构在低比特率下具有良好的性能,并且随着速率的增加对原始波形具有收敛性。
{"title":"Waveform interpolation coding with pitch-spaced subbands","authors":"W. Kleijn, Huimin Yang, E. Deprettere","doi":"10.21437/ICSLP.1998-382","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-382","url":null,"abstract":"We present new waveform-interpolation coding procedures which allow perfect reconstruction of the speech signal from the unquantized parameter set. Instead of using adaptive parameter extraction methods, we combine a time warping of the original signal with nonadaptive parameter extraction methods. The new coding structure has good performance at low bit rates and provides convergence to the original waveform with increasing rate.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124989500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
The influence of accents in australian English vowels and their relation to articulatory tract parameters 澳洲英语元音口音的影响及其与发音道参数的关系
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-208
D. Dersch, Chris Cléirigh, Julie Vonwiller
In this paper we analyse and compare a low dimensional linguistic representation of vowels with high dimensional prototypical vowel templates derived from a native Australian English speaker. We further perform the same analysis on Lebanese and Vietnamese accented English to investigate how di(cid:11)erences due to accents impact on such a representation. In a low dimensional linguistic representation a vowel is characterised by articulatory tract parameters. To simplify the problem, the study is restricted to vowels that, notionally at least, involve a steady state articulation i.e. a stable target con(cid:12)guration of tongue, lips and jaw between preceding and following articulatory transitions. Vowels are represented by the horizontal and vertical position of the part of the tongue involved in the key articulation of a particular vowel, e.g., high or low and front or back. To this is added lip posture, spread or rounded. Prototypical vowel templates are derived as follows. The sound pressure signal is parametrized by 12 mel-frequency cepstrum coe(cid:14)cients. At the centre of each phonetically labelled segment, 180 dimensional phone templates are extracted. For the group of short (/I/, /E/, /A/, /O/, /V/, /U/, /@/) and long vowels (/i:/, /e:/, /a:/, /o:/, /u:/, /@:/) we obtain vowel clusters by averaging over all templates of each vowel class and accent. The speech materiaThe speech material is taken from the Australian National Database Of Spoken Language (AN-DOSL). For a comparison of high dimensional vowel clusters derived from speech samples with low dimensional prototypical vowels in the articulatory tract representation we perform a reduction in dimension by a multidimensional scaling transformation in a two dimensional space. Here, a linear transformation maps a high dimensional space on a lower dimensional sub space by optimising the relative distances between data vectors. As an important result we (cid:12)nd. i) /@/ and /@:/ are surrounded by the remaining vowels; ii) the overall structure and the relative distances between the prototypical vowels are very similar. Varia-tions in the structure can be explained by the in(cid:13)uence of native Australian English, Lebanese Arabic and South Vietnamese accents.
在本文中,我们分析和比较了元音的低维语言表征与来自澳大利亚英语母语者的高维原型元音模板。我们进一步对黎巴嫩口音和越南口音的英语进行了相同的分析,以调查由于口音导致的di(cid:11)引用如何影响这种表示。在低维语言表征中,元音由发音道参数表征。为了简化问题,该研究仅限于元音,至少在理论上,涉及一个稳定的状态的发音,即一个稳定的目标结构(cid:12)舌头,嘴唇和下巴在前后发音转换之间。元音是由舌头上与某个特定元音的关键发音有关的部分的水平和垂直位置来表示的,例如,高或低,前或后。再加上嘴唇的姿势,张开或圆润。原型元音模板推导如下。声压信号的参数化是12梅尔频率倒频谱coe(cid:14)个客户。在每个语音标记段的中心,提取180维电话模板。对于短元音组(/I/, /E/, /A/, /O/, /V/, /U/, /@/)和长元音组(/I:/, /E:/, /A:/, /O:/, /U:/, /@:/),我们通过平均每个元音类和重音的所有模板来获得元音簇。演讲材料演讲材料取自澳大利亚国家口语数据库(AN-DOSL)。为了比较来自语音样本的高维元音簇和发音道表示中的低维原型元音,我们在二维空间中通过多维缩放变换进行降维。在这里,线性变换通过优化数据向量之间的相对距离将高维空间映射到低维子空间上。作为一个重要的结果,我们(cid:12)和。I) /@/和/@:/被剩余的元音包围;Ii)整体结构和原型元音之间的相对距离非常相似。这种结构的变化可以用澳大利亚本土英语、黎巴嫩阿拉伯语和越南南部口音的不同来解释。
{"title":"The influence of accents in australian English vowels and their relation to articulatory tract parameters","authors":"D. Dersch, Chris Cléirigh, Julie Vonwiller","doi":"10.21437/ICSLP.1998-208","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-208","url":null,"abstract":"In this paper we analyse and compare a low dimensional linguistic representation of vowels with high dimensional prototypical vowel templates derived from a native Australian English speaker. We further perform the same analysis on Lebanese and Vietnamese accented English to investigate how di(cid:11)erences due to accents impact on such a representation. In a low dimensional linguistic representation a vowel is characterised by articulatory tract parameters. To simplify the problem, the study is restricted to vowels that, notionally at least, involve a steady state articulation i.e. a stable target con(cid:12)guration of tongue, lips and jaw between preceding and following articulatory transitions. Vowels are represented by the horizontal and vertical position of the part of the tongue involved in the key articulation of a particular vowel, e.g., high or low and front or back. To this is added lip posture, spread or rounded. Prototypical vowel templates are derived as follows. The sound pressure signal is parametrized by 12 mel-frequency cepstrum coe(cid:14)cients. At the centre of each phonetically labelled segment, 180 dimensional phone templates are extracted. For the group of short (/I/, /E/, /A/, /O/, /V/, /U/, /@/) and long vowels (/i:/, /e:/, /a:/, /o:/, /u:/, /@:/) we obtain vowel clusters by averaging over all templates of each vowel class and accent. The speech materiaThe speech material is taken from the Australian National Database Of Spoken Language (AN-DOSL). For a comparison of high dimensional vowel clusters derived from speech samples with low dimensional prototypical vowels in the articulatory tract representation we perform a reduction in dimension by a multidimensional scaling transformation in a two dimensional space. Here, a linear transformation maps a high dimensional space on a lower dimensional sub space by optimising the relative distances between data vectors. As an important result we (cid:12)nd. i) /@/ and /@:/ are surrounded by the remaining vowels; ii) the overall structure and the relative distances between the prototypical vowels are very similar. Varia-tions in the structure can be explained by the in(cid:13)uence of native Australian English, Lebanese Arabic and South Vietnamese accents.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125086912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fly with the EAGLES: evaluation of the "ACCeSS" spoken language dialogue system 与鹰同行:对“ACCeSS”口语对话系统的评价
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-75
G. Hanrieder, Paul Heisterkamp, T. Brey
This paper reports the experiences we had in evaluating the ACCeSS system using the EAGLES evaluation metrics both at the input/output (black box evaluation) and component levels (glass box evaluation). We deliver an example of a complete evaluation of a continuous speech/mixed initiative system using these standards. Furthermore, we discuss some useful extensions to them.
本文报告了我们在输入/输出(黑盒评估)和组件级别(玻璃盒评估)上使用EAGLES评估指标评估ACCeSS系统的经验。我们提供了一个使用这些标准对连续语音/混合主动系统进行完整评估的示例。此外,我们还讨论了它们的一些有用的扩展。
{"title":"Fly with the EAGLES: evaluation of the \"ACCeSS\" spoken language dialogue system","authors":"G. Hanrieder, Paul Heisterkamp, T. Brey","doi":"10.21437/ICSLP.1998-75","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-75","url":null,"abstract":"This paper reports the experiences we had in evaluating the ACCeSS system using the EAGLES evaluation metrics both at the input/output (black box evaluation) and component levels (glass box evaluation). We deliver an example of a complete evaluation of a continuous speech/mixed initiative system using these standards. Furthermore, we discuss some useful extensions to them.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125167166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The acquisition of putonghua phonology 普通话音韵的习得
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-768
L. So, Zhou Jing
{"title":"The acquisition of putonghua phonology","authors":"L. So, Zhou Jing","doi":"10.21437/ICSLP.1998-768","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-768","url":null,"abstract":"","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125901157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel robust speech recognition algorithm based on multi-models and integrated decision method 一种基于多模型和集成决策方法的鲁棒语音识别算法
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-334
Shengxi Pan, Jia Liu, Jintao Jiang, Zuoying Wang, Dajin Lu
In this paper, a new robust speech recognition algorithm of multi-models and integrated decision(MMID) is proposed. A parallel MMID(PMMID) algorithm is developed. By using this new algorithm the advantages of different models can be integrated into one system. This algorithm uses different acoustic models at the same time based on DDBHMM (duration distribution based Hidden Markov Model)[2]. These different models include the channel-mismatch-correct(CMC) model, more-alternative-pronunciation model, tone and non-tone models of Chinese Mandarin speech, voice activity detection(VAD) model and state-skip model. The speech recognition accuracy of the multi-model system is better than that of single-model system in the adverse environments. The experimental results show that the error rate of the recognition system is 2.9% and reduced by 81% compared with the baseline system of the single-model.
提出了一种新的多模型集成决策鲁棒语音识别算法。提出了一种并行mmiid (pmmiid)算法。该算法可以将不同模型的优点集成到一个系统中。该算法基于DDBHMM(基于持续时间分布的隐马尔可夫模型)[2],同时使用不同的声学模型。这些模型包括通道错配校正(CMC)模型、多备选语音模型、汉语普通话语音的声调和非声调模型、语音活动检测(VAD)模型和状态跳过模型。在恶劣环境下,多模型系统的语音识别精度优于单模型系统。实验结果表明,该识别系统的错误率为2.9%,与单模型的基线系统相比降低了81%。
{"title":"A novel robust speech recognition algorithm based on multi-models and integrated decision method","authors":"Shengxi Pan, Jia Liu, Jintao Jiang, Zuoying Wang, Dajin Lu","doi":"10.21437/ICSLP.1998-334","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-334","url":null,"abstract":"In this paper, a new robust speech recognition algorithm of multi-models and integrated decision(MMID) is proposed. A parallel MMID(PMMID) algorithm is developed. By using this new algorithm the advantages of different models can be integrated into one system. This algorithm uses different acoustic models at the same time based on DDBHMM (duration distribution based Hidden Markov Model)[2]. These different models include the channel-mismatch-correct(CMC) model, more-alternative-pronunciation model, tone and non-tone models of Chinese Mandarin speech, voice activity detection(VAD) model and state-skip model. The speech recognition accuracy of the multi-model system is better than that of single-model system in the adverse environments. The experimental results show that the error rate of the recognition system is 2.9% and reduced by 81% compared with the baseline system of the single-model.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal variables in lectures in the Japanese language 日语讲座中的时间变量
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-842
Michiko Watanabe
In second language input studies, speaking speed is regarded as one of the most influential factors in comprehension. However, research in this area has mainly been conducted on written texts read aloud. The present study investigated temporal variables, such as articulation rate and ratio and frequency of fillers and silent pauses, in three university lectures given in Japanese. It was found that the total duration ratio of fillers was as great as that of silent pauses. It also became clear that, for individual speakers, articulation rate and frequency of fillers are relatively constant, while frequency of silent pauses varies depending on discourse section. Of total pause ratio, pause frequency and articulation rate, the latter correlated best with listener ratings of speech speed. The findings suggest that spontaneous speech requires methods of speech speed measurement different from those for read speech.
在第二语言输入研究中,语速被认为是影响理解的最重要因素之一。然而,这一领域的研究主要是对大声朗读的书面文本进行的。本研究调查了三个大学日语讲座的时间变量,如发音率、填充词和无声停顿的比例和频率。研究发现,填充物的总持续时间比与沉默停顿的总持续时间比一样大。我们还清楚地发现,对于单个说话者来说,填充语的发音速度和频率是相对恒定的,而沉默停顿的频率则因话语部分而异。在总停顿率、停顿频率和发音率中,后者与听者对语速的评价相关性最好。研究结果表明,自发语音需要不同于阅读语音的语速测量方法。
{"title":"Temporal variables in lectures in the Japanese language","authors":"Michiko Watanabe","doi":"10.21437/ICSLP.1998-842","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-842","url":null,"abstract":"In second language input studies, speaking speed is regarded as one of the most influential factors in comprehension. However, research in this area has mainly been conducted on written texts read aloud. The present study investigated temporal variables, such as articulation rate and ratio and frequency of fillers and silent pauses, in three university lectures given in Japanese. It was found that the total duration ratio of fillers was as great as that of silent pauses. It also became clear that, for individual speakers, articulation rate and frequency of fillers are relatively constant, while frequency of silent pauses varies depending on discourse section. Of total pause ratio, pause frequency and articulation rate, the latter correlated best with listener ratings of speech speed. The findings suggest that spontaneous speech requires methods of speech speed measurement different from those for read speech.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123245900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mixed-excitation frequency domain model for time-scale pitch-scale modification of speech 语音时阶音高阶修正的混合激励频域模型
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-16
A. Acero
This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation, the new method shows improvement in voiced fricatives and over-stretched voiced sounds. In addition, it allows for spectral manipulation such as smoothing of discontinuities at unit boundaries, voice transformations or loudness equalization.
本文提出了一种用于串联语音合成的时尺度音阶修正技术。该方法基于频域源-滤波器模型,其中源被建模为混合激励。该模型与压缩方案高度耦合,从而产生紧凑的声学清单。与没有混合激励的Whistler系统的方法相比,新方法在浊音摩擦音和过伸浊音方面表现出改善。此外,它允许频谱操作,如平滑不连续在单位边界,语音转换或响度均衡。
{"title":"A mixed-excitation frequency domain model for time-scale pitch-scale modification of speech","authors":"A. Acero","doi":"10.21437/ICSLP.1998-16","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-16","url":null,"abstract":"This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation, the new method shows improvement in voiced fricatives and over-stretched voiced sounds. In addition, it allows for spectral manipulation such as smoothing of discontinuities at unit boundaries, voice transformations or loudness equalization.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123330174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Language independent and language adaptive large vocabulary speech recognition 语言独立和语言自适应的大词汇量语音识别
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-751
Tanja Schultz, A. Waibel
This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. Based on a global phoneme set we built different multilingual speech recognition systems for five of the 15 languages. Context dependent phoneme models are created data-driven by introducing questions about language and language groups to our polyphone clustering procedure. We apply the resulting multilingual models to unseen languages and present several recognition results in language independent and language adaptive setups.
本文介绍了一种基于LVCSR听写数据库的多语言语音识别器的设计,该数据库是在GlobalPhone项目下收集的。卡尔斯鲁厄大学的这个项目研究了世界上15种语言的LVCSR系统,即阿拉伯语、中文、克罗地亚语、英语、法语、德语、意大利语、日语、韩语、葡萄牙语、俄语、西班牙语、瑞典语、泰米尔语和土耳其语。基于全球音素集,我们为15种语言中的5种构建了不同的多语言语音识别系统。上下文相关的音素模型是通过在我们的多音素聚类过程中引入关于语言和语言群的问题来创建数据驱动的。我们将所得到的多语言模型应用于未见过的语言,并在语言独立和语言自适应设置中给出了几个识别结果。
{"title":"Language independent and language adaptive large vocabulary speech recognition","authors":"Tanja Schultz, A. Waibel","doi":"10.21437/ICSLP.1998-751","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-751","url":null,"abstract":"This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. Based on a global phoneme set we built different multilingual speech recognition systems for five of the 15 languages. Context dependent phoneme models are created data-driven by introducing questions about language and language groups to our polyphone clustering procedure. We apply the resulting multilingual models to unseen languages and present several recognition results in language independent and language adaptive setups.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"47 30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123787489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
Phonetic and phonological markers of contrastive focus in Korean 朝鲜语对比焦点的语音和语音标记
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-151
Sun-Ah Jun, Hyuck-Joon Lee
Cross-linguistically, focus is often cued by suprasegmental features and changes in phrasing. In this paper, phonetic and phonological markers of contrastive focus in Korean are investigated. We find that, as a phonological marker, focus initiates an accentual phrase (AP), and tends to, but does not always, include the following words in the same AP. But regardless of whether the post-focus sequence is dephrased or not, there is a significant expansion of the focused peak compared to the peak on the following words, thus achieving the perceptual goal of focus: prominence of the focused word relative to the following items. As a phonetic marker, a focused AP has extra-strengthening on its left edge, and the sequence before and after focus tends to be shorter than that in a neutral sentence.
从跨语言的角度来看,注意力通常是由超分段特征和措辞变化引起的。本文对韩语对比焦点的语音标记和语音标记进行了研究。我们发现,作为语音标记,焦点会引发一个重音短语(AP),并倾向于(但并不总是)将以下单词包含在同一个AP中。但无论后焦点序列是否被去词组化,与以下单词的峰值相比,焦点峰值会显著扩大,从而实现焦点的感知目标:焦点单词相对于以下项目的突出性。作为语音标记,聚焦后的辅音在其左边缘有额外的强化,且聚焦前后的顺序往往比中性句短。
{"title":"Phonetic and phonological markers of contrastive focus in Korean","authors":"Sun-Ah Jun, Hyuck-Joon Lee","doi":"10.21437/ICSLP.1998-151","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-151","url":null,"abstract":"Cross-linguistically, focus is often cued by suprasegmental features and changes in phrasing. In this paper, phonetic and phonological markers of contrastive focus in Korean are investigated. We find that, as a phonological marker, focus initiates an accentual phrase (AP), and tends to, but does not always, include the following words in the same AP. But regardless of whether the post-focus sequence is dephrased or not, there is a significant expansion of the focused peak compared to the peak on the following words, thus achieving the perceptual goal of focus: prominence of the focused word relative to the following items. As a phonetic marker, a focused AP has extra-strengthening on its left edge, and the sequence before and after focus tends to be shorter than that in a neutral sentence.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115049274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Improved utterance rejection using length dependent thresholds 使用长度相关阈值改进的语音抑制
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-425
Sunil K. Gupta, F. Soong
In this paper, we propose to use an utterance length (duration) dependent threshold for rejecting an unknown input utterance with a general speech(garbage) model. A general speech model, com-paring with more sophisticated anti-subword models, is a more viable solution to the utterance rejection problem for low-cost ap-plications with stringent storage and computational constraints. However, the rejection performance using such a general model with a fixed, universal rejection threshold is in general worse than the anti-models with higher discriminations. Without adding complexities to the rejection algorithm, we propose to vary the rejection threshold according to the utterance length. The experimental results show that significant improvement in rejection performance can be obtained by using the proposed, length dependent rejection threshold over a fixed threshold. We investigate utterance rejection in a command phrase recognition task. The equal error rate, a good figure of merit for calibrating the performance of utterance verification algorithms, is reduced by almost 23% when the proposed length dependent threshold is used.
在本文中,我们建议使用一个依赖于话语长度(持续时间)的阈值来拒绝一个未知的输入话语,并使用一个通用的语音(垃圾)模型。对于具有严格存储和计算约束的低成本应用程序,通用语音模型比更复杂的反子词模型更能解决语音拒绝问题。然而,使用这种具有固定的通用拒绝阈值的通用模型的拒绝性能通常比具有较高判别的反模型差。在不增加拒绝算法复杂性的前提下,我们建议根据话语长度改变拒绝阈值。实验结果表明,在固定阈值的基础上,采用基于长度的拒止阈值可以显著提高拒止性能。我们研究命令短语识别任务中的话语拒绝。当使用所提出的长度相关阈值时,相等错误率减少了近23%,这是校准话语验证算法性能的一个很好的指标。
{"title":"Improved utterance rejection using length dependent thresholds","authors":"Sunil K. Gupta, F. Soong","doi":"10.21437/ICSLP.1998-425","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-425","url":null,"abstract":"In this paper, we propose to use an utterance length (duration) dependent threshold for rejecting an unknown input utterance with a general speech(garbage) model. A general speech model, com-paring with more sophisticated anti-subword models, is a more viable solution to the utterance rejection problem for low-cost ap-plications with stringent storage and computational constraints. However, the rejection performance using such a general model with a fixed, universal rejection threshold is in general worse than the anti-models with higher discriminations. Without adding complexities to the rejection algorithm, we propose to vary the rejection threshold according to the utterance length. The experimental results show that significant improvement in rejection performance can be obtained by using the proposed, length dependent rejection threshold over a fixed threshold. We investigate utterance rejection in a command phrase recognition task. The equal error rate, a good figure of merit for calibrating the performance of utterance verification algorithms, is reduced by almost 23% when the proposed length dependent threshold is used.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115985817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
5th International Conference on Spoken Language Processing (ICSLP 1998)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1