5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文中文

Waveform interpolation coding with pitch-spaced subbands 带间距子带的波形插值编码

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-382

W. Kleijn, Huimin Yang, E. Deprettere

We present new waveform-interpolation coding procedures which allow perfect reconstruction of the speech signal from the unquantized parameter set. Instead of using adaptive parameter extraction methods, we combine a time warping of the original signal with nonadaptive parameter extraction methods. The new coding structure has good performance at low bit rates and provides convergence to the original waveform with increasing rate.

我们提出了一种新的波形插值编码方法，可以从未量化的参数集中完美地重建语音信号。我们将原始信号的时间规整与非自适应参数提取方法相结合，而不是使用自适应参数提取方法。新的编码结构在低比特率下具有良好的性能，并且随着速率的增加对原始波形具有收敛性。

引用次数: 9

The influence of accents in australian English vowels and their relation to articulatory tract parameters 澳洲英语元音口音的影响及其与发音道参数的关系

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-208

D. Dersch, Chris Cléirigh, Julie Vonwiller

In this paper we analyse and compare a low dimensional linguistic representation of vowels with high dimensional prototypical vowel templates derived from a native Australian English speaker. We further perform the same analysis on Lebanese and Vietnamese accented English to investigate how di(cid:11)erences due to accents impact on such a representation. In a low dimensional linguistic representation a vowel is characterised by articulatory tract parameters. To simplify the problem, the study is restricted to vowels that, notionally at least, involve a steady state articulation i.e. a stable target con(cid:12)guration of tongue, lips and jaw between preceding and following articulatory transitions. Vowels are represented by the horizontal and vertical position of the part of the tongue involved in the key articulation of a particular vowel, e.g., high or low and front or back. To this is added lip posture, spread or rounded. Prototypical vowel templates are derived as follows. The sound pressure signal is parametrized by 12 mel-frequency cepstrum coe(cid:14)cients. At the centre of each phonetically labelled segment, 180 dimensional phone templates are extracted. For the group of short (/I/, /E/, /A/, /O/, /V/, /U/, /@/) and long vowels (/i:/, /e:/, /a:/, /o:/, /u:/, /@:/) we obtain vowel clusters by averaging over all templates of each vowel class and accent. The speech materiaThe speech material is taken from the Australian National Database Of Spoken Language (AN-DOSL). For a comparison of high dimensional vowel clusters derived from speech samples with low dimensional prototypical vowels in the articulatory tract representation we perform a reduction in dimension by a multidimensional scaling transformation in a two dimensional space. Here, a linear transformation maps a high dimensional space on a lower dimensional sub space by optimising the relative distances between data vectors. As an important result we (cid:12)nd. i) /@/ and /@:/ are surrounded by the remaining vowels; ii) the overall structure and the relative distances between the prototypical vowels are very similar. Varia-tions in the structure can be explained by the in(cid:13)uence of native Australian English, Lebanese Arabic and South Vietnamese accents.

在本文中，我们分析和比较了元音的低维语言表征与来自澳大利亚英语母语者的高维原型元音模板。我们进一步对黎巴嫩口音和越南口音的英语进行了相同的分析，以调查由于口音导致的di(cid:11)引用如何影响这种表示。在低维语言表征中，元音由发音道参数表征。为了简化问题，该研究仅限于元音，至少在理论上，涉及一个稳定的状态的发音，即一个稳定的目标结构(cid:12)舌头，嘴唇和下巴在前后发音转换之间。元音是由舌头上与某个特定元音的关键发音有关的部分的水平和垂直位置来表示的，例如，高或低，前或后。再加上嘴唇的姿势，张开或圆润。原型元音模板推导如下。声压信号的参数化是12梅尔频率倒频谱coe(cid:14)个客户。在每个语音标记段的中心，提取180维电话模板。对于短元音组(/I/， /E/， /A/， /O/， /V/， /U/， /@/)和长元音组(/I:/， /E:/， /A:/， /O:/， /U:/， /@:/)，我们通过平均每个元音类和重音的所有模板来获得元音簇。演讲材料演讲材料取自澳大利亚国家口语数据库(AN-DOSL)。为了比较来自语音样本的高维元音簇和发音道表示中的低维原型元音，我们在二维空间中通过多维缩放变换进行降维。在这里，线性变换通过优化数据向量之间的相对距离将高维空间映射到低维子空间上。作为一个重要的结果，我们(cid:12)和。I) /@/和/@:/被剩余的元音包围;Ii)整体结构和原型元音之间的相对距离非常相似。这种结构的变化可以用澳大利亚本土英语、黎巴嫩阿拉伯语和越南南部口音的不同来解释。

{"title":"The influence of accents in australian English vowels and their relation to articulatory tract parameters","authors":"D. Dersch, Chris Cléirigh, Julie Vonwiller","doi":"10.21437/ICSLP.1998-208","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-208","url":null,"abstract":"In this paper we analyse and compare a low dimensional linguistic representation of vowels with high dimensional prototypical vowel templates derived from a native Australian English speaker. We further perform the same analysis on Lebanese and Vietnamese accented English to investigate how di(cid:11)erences due to accents impact on such a representation. In a low dimensional linguistic representation a vowel is characterised by articulatory tract parameters. To simplify the problem, the study is restricted to vowels that, notionally at least, involve a steady state articulation i.e. a stable target con(cid:12)guration of tongue, lips and jaw between preceding and following articulatory transitions. Vowels are represented by the horizontal and vertical position of the part of the tongue involved in the key articulation of a particular vowel, e.g., high or low and front or back. To this is added lip posture, spread or rounded. Prototypical vowel templates are derived as follows. The sound pressure signal is parametrized by 12 mel-frequency cepstrum coe(cid:14)cients. At the centre of each phonetically labelled segment, 180 dimensional phone templates are extracted. For the group of short (/I/, /E/, /A/, /O/, /V/, /U/, /@/) and long vowels (/i:/, /e:/, /a:/, /o:/, /u:/, /@:/) we obtain vowel clusters by averaging over all templates of each vowel class and accent. The speech materiaThe speech material is taken from the Australian National Database Of Spoken Language (AN-DOSL). For a comparison of high dimensional vowel clusters derived from speech samples with low dimensional prototypical vowels in the articulatory tract representation we perform a reduction in dimension by a multidimensional scaling transformation in a two dimensional space. Here, a linear transformation maps a high dimensional space on a lower dimensional sub space by optimising the relative distances between data vectors. As an important result we (cid:12)nd. i) /@/ and /@:/ are surrounded by the remaining vowels; ii) the overall structure and the relative distances between the prototypical vowels are very similar. Varia-tions in the structure can be explained by the in(cid:13)uence of native Australian English, Lebanese Arabic and South Vietnamese accents.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125086912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fly with the EAGLES: evaluation of the "ACCeSS" spoken language dialogue system 与鹰同行:对“ACCeSS”口语对话系统的评价

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-75

G. Hanrieder, Paul Heisterkamp, T. Brey

This paper reports the experiences we had in evaluating the ACCeSS system using the EAGLES evaluation metrics both at the input/output (black box evaluation) and component levels (glass box evaluation). We deliver an example of a complete evaluation of a continuous speech/mixed initiative system using these standards. Furthermore, we discuss some useful extensions to them.

本文报告了我们在输入/输出(黑盒评估)和组件级别(玻璃盒评估)上使用EAGLES评估指标评估ACCeSS系统的经验。我们提供了一个使用这些标准对连续语音/混合主动系统进行完整评估的示例。此外，我们还讨论了它们的一些有用的扩展。

引用次数: 7

The acquisition of putonghua phonology 普通话音韵的习得

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-768

L. So, Zhou Jing

引用次数: 0

A novel robust speech recognition algorithm based on multi-models and integrated decision method 一种基于多模型和集成决策方法的鲁棒语音识别算法

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-334

Shengxi Pan, Jia Liu, Jintao Jiang, Zuoying Wang, Dajin Lu

In this paper, a new robust speech recognition algorithm of multi-models and integrated decision(MMID) is proposed. A parallel MMID(PMMID) algorithm is developed. By using this new algorithm the advantages of different models can be integrated into one system. This algorithm uses different acoustic models at the same time based on DDBHMM (duration distribution based Hidden Markov Model)[2]. These different models include the channel-mismatch-correct(CMC) model, more-alternative-pronunciation model, tone and non-tone models of Chinese Mandarin speech, voice activity detection(VAD) model and state-skip model. The speech recognition accuracy of the multi-model system is better than that of single-model system in the adverse environments. The experimental results show that the error rate of the recognition system is 2.9% and reduced by 81% compared with the baseline system of the single-model.

提出了一种新的多模型集成决策鲁棒语音识别算法。提出了一种并行mmiid (pmmiid)算法。该算法可以将不同模型的优点集成到一个系统中。该算法基于DDBHMM(基于持续时间分布的隐马尔可夫模型)[2]，同时使用不同的声学模型。这些模型包括通道错配校正(CMC)模型、多备选语音模型、汉语普通话语音的声调和非声调模型、语音活动检测(VAD)模型和状态跳过模型。在恶劣环境下，多模型系统的语音识别精度优于单模型系统。实验结果表明，该识别系统的错误率为2.9%，与单模型的基线系统相比降低了81%。

引用次数: 0

Temporal variables in lectures in the Japanese language 日语讲座中的时间变量

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-842

Michiko Watanabe

In second language input studies, speaking speed is regarded as one of the most influential factors in comprehension. However, research in this area has mainly been conducted on written texts read aloud. The present study investigated temporal variables, such as articulation rate and ratio and frequency of fillers and silent pauses, in three university lectures given in Japanese. It was found that the total duration ratio of fillers was as great as that of silent pauses. It also became clear that, for individual speakers, articulation rate and frequency of fillers are relatively constant, while frequency of silent pauses varies depending on discourse section. Of total pause ratio, pause frequency and articulation rate, the latter correlated best with listener ratings of speech speed. The findings suggest that spontaneous speech requires methods of speech speed measurement different from those for read speech.

在第二语言输入研究中，语速被认为是影响理解的最重要因素之一。然而，这一领域的研究主要是对大声朗读的书面文本进行的。本研究调查了三个大学日语讲座的时间变量，如发音率、填充词和无声停顿的比例和频率。研究发现，填充物的总持续时间比与沉默停顿的总持续时间比一样大。我们还清楚地发现，对于单个说话者来说，填充语的发音速度和频率是相对恒定的，而沉默停顿的频率则因话语部分而异。在总停顿率、停顿频率和发音率中，后者与听者对语速的评价相关性最好。研究结果表明，自发语音需要不同于阅读语音的语速测量方法。

引用次数: 0

A mixed-excitation frequency domain model for time-scale pitch-scale modification of speech 语音时阶音高阶修正的混合激励频域模型

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-16

A. Acero

This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation, the new method shows improvement in voiced fricatives and over-stretched voiced sounds. In addition, it allows for spectral manipulation such as smoothing of discontinuities at unit boundaries, voice transformations or loudness equalization.

本文提出了一种用于串联语音合成的时尺度音阶修正技术。该方法基于频域源-滤波器模型，其中源被建模为混合激励。该模型与压缩方案高度耦合，从而产生紧凑的声学清单。与没有混合激励的Whistler系统的方法相比，新方法在浊音摩擦音和过伸浊音方面表现出改善。此外，它允许频谱操作，如平滑不连续在单位边界，语音转换或响度均衡。

引用次数: 3

Language independent and language adaptive large vocabulary speech recognition 语言独立和语言自适应的大词汇量语音识别

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-751

Tanja Schultz, A. Waibel

This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. Based on a global phoneme set we built different multilingual speech recognition systems for five of the 15 languages. Context dependent phoneme models are created data-driven by introducing questions about language and language groups to our polyphone clustering procedure. We apply the resulting multilingual models to unseen languages and present several recognition results in language independent and language adaptive setups.

本文介绍了一种基于LVCSR听写数据库的多语言语音识别器的设计，该数据库是在GlobalPhone项目下收集的。卡尔斯鲁厄大学的这个项目研究了世界上15种语言的LVCSR系统，即阿拉伯语、中文、克罗地亚语、英语、法语、德语、意大利语、日语、韩语、葡萄牙语、俄语、西班牙语、瑞典语、泰米尔语和土耳其语。基于全球音素集，我们为15种语言中的5种构建了不同的多语言语音识别系统。上下文相关的音素模型是通过在我们的多音素聚类过程中引入关于语言和语言群的问题来创建数据驱动的。我们将所得到的多语言模型应用于未见过的语言，并在语言独立和语言自适应设置中给出了几个识别结果。

引用次数: 90

Phonetic and phonological markers of contrastive focus in Korean 朝鲜语对比焦点的语音和语音标记

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-151

Sun-Ah Jun, Hyuck-Joon Lee

Cross-linguistically, focus is often cued by suprasegmental features and changes in phrasing. In this paper, phonetic and phonological markers of contrastive focus in Korean are investigated. We find that, as a phonological marker, focus initiates an accentual phrase (AP), and tends to, but does not always, include the following words in the same AP. But regardless of whether the post-focus sequence is dephrased or not, there is a significant expansion of the focused peak compared to the peak on the following words, thus achieving the perceptual goal of focus: prominence of the focused word relative to the following items. As a phonetic marker, a focused AP has extra-strengthening on its left edge, and the sequence before and after focus tends to be shorter than that in a neutral sentence.

从跨语言的角度来看，注意力通常是由超分段特征和措辞变化引起的。本文对韩语对比焦点的语音标记和语音标记进行了研究。我们发现，作为语音标记，焦点会引发一个重音短语(AP)，并倾向于(但并不总是)将以下单词包含在同一个AP中。但无论后焦点序列是否被去词组化，与以下单词的峰值相比，焦点峰值会显著扩大，从而实现焦点的感知目标:焦点单词相对于以下项目的突出性。作为语音标记，聚焦后的辅音在其左边缘有额外的强化，且聚焦前后的顺序往往比中性句短。

引用次数: 65

Improved utterance rejection using length dependent thresholds 使用长度相关阈值改进的语音抑制

5th International Conference on Spoken Language Processing (ICSLP 1998)

Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-425

Sunil K. Gupta, F. Soong

In this paper, we propose to use an utterance length (duration) dependent threshold for rejecting an unknown input utterance with a general speech(garbage) model. A general speech model, com-paring with more sophisticated anti-subword models, is a more viable solution to the utterance rejection problem for low-cost ap-plications with stringent storage and computational constraints. However, the rejection performance using such a general model with a ﬁxed, universal rejection threshold is in general worse than the anti-models with higher discriminations. Without adding complexities to the rejection algorithm, we propose to vary the rejection threshold according to the utterance length. The experimental results show that signiﬁcant improvement in rejection performance can be obtained by using the proposed, length dependent rejection threshold over a ﬁxed threshold. We investigate utterance rejection in a command phrase recognition task. The equal error rate, a good ﬁgure of merit for calibrating the performance of utterance veriﬁcation algorithms, is reduced by almost 23% when the proposed length dependent threshold is used.

在本文中，我们建议使用一个依赖于话语长度(持续时间)的阈值来拒绝一个未知的输入话语，并使用一个通用的语音(垃圾)模型。对于具有严格存储和计算约束的低成本应用程序，通用语音模型比更复杂的反子词模型更能解决语音拒绝问题。然而，使用这种具有固定的通用拒绝阈值的通用模型的拒绝性能通常比具有较高判别的反模型差。在不增加拒绝算法复杂性的前提下，我们建议根据话语长度改变拒绝阈值。实验结果表明，在固定阈值的基础上，采用基于长度的拒止阈值可以显著提高拒止性能。我们研究命令短语识别任务中的话语拒绝。当使用所提出的长度相关阈值时，相等错误率减少了近23%，这是校准话语验证算法性能的一个很好的指标。

引用次数: 7

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀