首页 > 最新文献

5th International Conference on Spoken Language Processing (ICSLP 1998)最新文献

英文 中文
Acoustic indicators of topic segmentation 主题分割的声学指标
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-582
Julia Hirschberg, C. H. Nakatani
The segmentation of text and speech into topics and subtopics is an important step in document interpretation. For text, formatting information, such as headings and paragraphing, is available to aid in this endeavor, although this information is by no means su cient. For speech, the task is even more di cult. We present results of the application of machine learning techniques to the automatic identi cation of intonational phrases beginning and ending 'topics' determined independently by annotators for two corpora | the Boston Directions Corpus and the Broadcast News (HUB-4) DARPA/NIST database.
将文本和语音分割为主题和子主题是文件解释的重要步骤。对于文本,格式信息,如标题和分段,可以帮助完成这项工作,尽管这些信息绝不是足够的。对于演讲来说,任务更加艰巨。我们展示了机器学习技术在两个语料库(波士顿方向语料库和广播新闻(HUB-4) DARPA/NIST数据库)中由注释者独立确定的语调短语开始和结束“主题”的自动识别中的应用结果。
{"title":"Acoustic indicators of topic segmentation","authors":"Julia Hirschberg, C. H. Nakatani","doi":"10.21437/ICSLP.1998-582","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-582","url":null,"abstract":"The segmentation of text and speech into topics and subtopics is an important step in document interpretation. For text, formatting information, such as headings and paragraphing, is available to aid in this endeavor, although this information is by no means su cient. For speech, the task is even more di cult. We present results of the application of machine learning techniques to the automatic identi cation of intonational phrases beginning and ending 'topics' determined independently by annotators for two corpora | the Boston Directions Corpus and the Broadcast News (HUB-4) DARPA/NIST database.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128117617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Dealing with out-of-vocabulary words and speech disfluencies in an n-gram based speech understanding system 基于n-gram的语音理解系统中词汇外词和语音不流畅的处理
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-648
A. Kai, Y. Hirose, S. Nakagawa
In this study, we investigate the e(cid:11)ectiveness of an unknown word processing(UWP) algorithm, which is incorporated into an N-gram language model based speech recognition system for dealing with (cid:12)lled pauses and out- of-vocabulary(OOV) words. We have already been investigated the e(cid:11)ect of the UWP algorithm, which utilizes a simple subword sequence decoder, in a spoken dialog sys- tem using a context free grammar(CFG) as a language model. The e(cid:11)ect of the UWP algorithm was investigated using an N-based continuous speech recognition system on both a small dialog task and a large-vocabulary read speech dictation task. The experiment results showed that the UWP improves the recognition accuracy and an N-gram based system with the UWP can improve the understanding performance in compared with a CFG-based system.
在本研究中,我们研究了未知字处理(UWP)算法的有效性,该算法被纳入基于N-gram语言模型的语音识别系统中,用于处理(cid:12)停顿和词汇外(OOV)单词。我们已经研究了UWP算法的e(cid:11)部分,该算法使用一个简单的子词序列解码器,在一个使用上下文无关语法(CFG)作为语言模型的口语对话系统中。在一个基于n的连续语音识别系统中,研究了UWP算法在小对话任务和大词汇量读语音听写任务中的效果。实验结果表明,UWP提高了识别精度,与基于cfg的系统相比,基于n图的UWP系统可以提高理解性能。
{"title":"Dealing with out-of-vocabulary words and speech disfluencies in an n-gram based speech understanding system","authors":"A. Kai, Y. Hirose, S. Nakagawa","doi":"10.21437/ICSLP.1998-648","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-648","url":null,"abstract":"In this study, we investigate the e(cid:11)ectiveness of an unknown word processing(UWP) algorithm, which is incorporated into an N-gram language model based speech recognition system for dealing with (cid:12)lled pauses and out- of-vocabulary(OOV) words. We have already been investigated the e(cid:11)ect of the UWP algorithm, which utilizes a simple subword sequence decoder, in a spoken dialog sys- tem using a context free grammar(CFG) as a language model. The e(cid:11)ect of the UWP algorithm was investigated using an N-based continuous speech recognition system on both a small dialog task and a large-vocabulary read speech dictation task. The experiment results showed that the UWP improves the recognition accuracy and an N-gram based system with the UWP can improve the understanding performance in compared with a CFG-based system.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125718849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Context dependent tree based transforms for phonetic speech recognition 基于上下文相关树的语音识别变换
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-645
Bernard Doherty, S. Vaseghi, P. McCourt
This paper presents a novel method for modeling phonetic context using linear context transforms. Initial investigations have shown the feasibility of synthesising context dependent models from context independent models through weighted interpolation of the peripheral states of a given hidden markov model with its adjacent model. This idea can be further extended, to maximum likelihood estimation of not only single weights, but a matrix of weights or a transform. This paper outlines the application of Maximum Likelihood Linear Regression (MLLR) as a means of modeling context dependency in continuous density Hidden Markov Models (HMM).
本文提出了一种基于线性语境变换的语音语境建模方法。初步研究表明,通过对给定隐马尔可夫模型的外围状态与其相邻模型进行加权插值,可以从上下文独立模型合成上下文相关模型。这个思想可以进一步推广,不仅可以对单个权值进行极大似然估计,还可以对权值矩阵或变换进行极大似然估计。本文概述了极大似然线性回归(MLLR)作为连续密度隐马尔可夫模型(HMM)中上下文依赖性建模方法的应用。
{"title":"Context dependent tree based transforms for phonetic speech recognition","authors":"Bernard Doherty, S. Vaseghi, P. McCourt","doi":"10.21437/ICSLP.1998-645","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-645","url":null,"abstract":"This paper presents a novel method for modeling phonetic context using linear context transforms. Initial investigations have shown the feasibility of synthesising context dependent models from context independent models through weighted interpolation of the peripheral states of a given hidden markov model with its adjacent model. This idea can be further extended, to maximum likelihood estimation of not only single weights, but a matrix of weights or a transform. This paper outlines the application of Maximum Likelihood Linear Regression (MLLR) as a means of modeling context dependency in continuous density Hidden Markov Models (HMM).","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127910160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What spreads, and how? tonal rightward spreading on shanghai disyllabic compounds 什么会传播,如何传播?上海双音节复合词声调向右扩散
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-145
X. Zhu
The present paper examines what kinds of Shanghai disyllabic lexical tone sandhi undergoes, especially in what sense and to what extent a disyllabic tone can be claimed to result from rightward spreading of the corresponding citation tone. It will be shown that F0 spreading occurs in the Long tone domains while Contour element spreading mainly in the Short tone domains.
本文考察了上海语双音节词汇变调的类型,特别是在何种意义上和何种程度上双音节变调可以被认为是由相应的引证声调向右扩散而产生的。结果表明,F0扩展主要发生在长音域,而轮廓元扩展主要发生在短音域。
{"title":"What spreads, and how? tonal rightward spreading on shanghai disyllabic compounds","authors":"X. Zhu","doi":"10.21437/ICSLP.1998-145","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-145","url":null,"abstract":"The present paper examines what kinds of Shanghai disyllabic lexical tone sandhi undergoes, especially in what sense and to what extent a disyllabic tone can be claimed to result from rightward spreading of the corresponding citation tone. It will be shown that F0 spreading occurs in the Long tone domains while Contour element spreading mainly in the Short tone domains.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128192571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN 语音对齐:基于语音合成的vs.混合HMM/ANN
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-595
F. Malfrère, O. Deroo, T. Dutoit
In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.
在本文中,我们比较了语音数据库的两种不同的语音标注方法。第一种方法是基于语音信号在高质量合成语音模式上的对齐,第二种方法是使用HMM/ANN混合系统。这两个系统都对在HMM/ANN系统的训练阶段从未见过的说话者的法语阅读话语进行了评估,并进行了手动分割。本研究概述了这两种方法的优缺点。高质量的语音合成系统具有不需要训练阶段的巨大优势,而经典的HMM/ANN系统很容易允许多个语音转录。我们推导了一种基于合成语音分割工具的语音数据库自动构建方法,该方法对HMM/ANN混合系统的训练过程进行了自引导。这类分割工具的重要性将是改进语音合成和识别系统发展的关键点。
{"title":"Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN","authors":"F. Malfrère, O. Deroo, T. Dutoit","doi":"10.21437/ICSLP.1998-595","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-595","url":null,"abstract":"In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented. This study outlines the advantages and drawbacks of both methods. The high quality speech synthetic system has the great advantage that no training stage is needed, while the classical HMM/ANN system easily allows multiple phonetic transcriptions. We deduce a method for the automatic constitution of phonetically labeled speech databases based on using the synthetic speech segmentation tool to bootstrap the training process of our hybrid HMM/ANN system. The importance of such segmentation tools will be a key point for the development of improved speech synthesis and recognition systems.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115831748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
The importance of the first syllable in English spoken word recognition by adult Japanese speakers 第一个音节在成人日语口语单词识别中的重要性
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-764
Kazuo Nakayama, Kaoru Tomita-Nakayama
We investigated adult Japanese speakers’ deficiencies in English spoken word recognition. We found that the accurate recognition of the first syllable or the initial portion of each word played an important role in recognizing a word correctly. It was implied in the study that their recognition performance would be enhanced by utilizing the speech processing methods, time-scale expansion and/or dynamic range compression. Although approximately 85 percent of English words begin with strong syllables [1], many of them do not carry a sentence stress and they are not pronounced as clearly as isolated words. Moreover, the duration of a word, especially a beginning word is so short that the listener can't recognize it correctly. Two experiments were administered in the anechoic room. In the first experiment, subjects listened to extracted words and corresponding isolated words of English, which included words without primary stress on the first syllables. We found that they had difficulty in recognizing both isolated words and the extracted words, especially when the word did not begin with a strong syllable, which was sounded somewhat unclear. This is quite frequent in a normal English speech. We confirmed that they had difficulty recognizing the words which began with weak syllables and it is concluded that the first syllable plays an important role in the recognition of words at least for Japanese speakers. In the second experiment, the extracted words and the corresponding time-scale expanded words (henceforth, expanded words) were given. The result indicated that the expanded words were better recognized. It is found that the time-scale modification (henceforth, TSM) of the extracted words didn’t lose intelligibility even around the ratio of 2.00, as was clear from the fact that the recognition improved.
我们调查了成年日语使用者在英语口语单词识别方面的缺陷。我们发现,准确识别第一个音节或每个单词的开头部分对正确识别单词起着重要作用。研究表明,采用语音处理方法、时间尺度扩展和/或动态范围压缩可以提高它们的识别性能。虽然大约85%的英语单词以强音节开头[1],但其中许多单词不带句子重音,而且它们的发音不像孤立的单词那样清晰。此外,一个单词的持续时间,尤其是开头的单词太短,听者无法正确识别它。在消声室进行了两项实验。在第一个实验中,受试者听抽取的单词和相应的英语孤立单词,其中包括第一个音节没有主重音的单词。我们发现,他们在识别孤立的单词和提取出来的单词时都有困难,尤其是当单词的开头不是一个强音节时,这个音节听起来有点不清楚。这在正常的英语演讲中是很常见的。我们证实,他们很难识别以弱音节开头的单词,结论是,第一个音节在单词识别中起着重要作用,至少对说日语的人来说是这样。在第二个实验中,给出了提取的词和相应的时间尺度扩展词(以下简称扩展词)。结果表明,扩展后的单词被更好地识别。我们发现,即使在2.00左右,被提取词的时间尺度修正(以下简称TSM)也没有失去可理解性,这可以从识别能力提高的事实中看出。
{"title":"The importance of the first syllable in English spoken word recognition by adult Japanese speakers","authors":"Kazuo Nakayama, Kaoru Tomita-Nakayama","doi":"10.21437/ICSLP.1998-764","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-764","url":null,"abstract":"We investigated adult Japanese speakers’ deficiencies in English spoken word recognition. We found that the accurate recognition of the first syllable or the initial portion of each word played an important role in recognizing a word correctly. It was implied in the study that their recognition performance would be enhanced by utilizing the speech processing methods, time-scale expansion and/or dynamic range compression. Although approximately 85 percent of English words begin with strong syllables [1], many of them do not carry a sentence stress and they are not pronounced as clearly as isolated words. Moreover, the duration of a word, especially a beginning word is so short that the listener can't recognize it correctly. Two experiments were administered in the anechoic room. In the first experiment, subjects listened to extracted words and corresponding isolated words of English, which included words without primary stress on the first syllables. We found that they had difficulty in recognizing both isolated words and the extracted words, especially when the word did not begin with a strong syllable, which was sounded somewhat unclear. This is quite frequent in a normal English speech. We confirmed that they had difficulty recognizing the words which began with weak syllables and it is concluded that the first syllable plays an important role in the recognition of words at least for Japanese speakers. In the second experiment, the extracted words and the corresponding time-scale expanded words (henceforth, expanded words) were given. The result indicated that the expanded words were better recognized. It is found that the time-scale modification (henceforth, TSM) of the extracted words didn’t lose intelligibility even around the ratio of 2.00, as was clear from the fact that the recognition improved.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115896967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computer-mediated input and the acquisition of L2 vowels 计算机媒介输入与第二语言元音的习得
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-844
M. Fox
Programs for testing and training of difficult vowel distinctions in American English were created for subjects to access via the Internet using a web browser. The testing and training data include many likely vowel confusions for speakers of different L1s. The training program focuses on one distinction at a time, and adjusts to concentrate on particular contexts or exemplars that are difficult for the individual subject. In the current study, 52 subjects participated in testing and 2 subjects participated in training. In the testing portion, results indicate that the L1 and the fluency level in English, as well as individual variability, have an effect on perceptual ability. In the training portion, subjects showed significant improvement on the contrasts on which they trained. Because these programs make extensive data collection over large populations and large distances easy, this method of research will facilitate further investigation of questions regarding second language acquisition.
为测试和训练美国英语中困难的元音区别而设计的程序是通过网络浏览器通过互联网访问的。测试和训练数据包含了许多不同外语使用者可能出现的元音混淆。培训计划一次集中在一个区别上,并调整以集中在个别学科难以实现的特定背景或范例上。在本研究中,52名受试者参加了测试,2名受试者参加了培训。在测试部分,结果表明,母语和英语流利程度以及个体差异对感知能力有影响。在训练部分,受试者在他们训练的对比上表现出显著的改善。由于这些程序使大量人口和远距离的广泛数据收集变得容易,这种研究方法将有助于进一步调查有关第二语言习得的问题。
{"title":"Computer-mediated input and the acquisition of L2 vowels","authors":"M. Fox","doi":"10.21437/ICSLP.1998-844","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-844","url":null,"abstract":"Programs for testing and training of difficult vowel distinctions in American English were created for subjects to access via the Internet using a web browser. The testing and training data include many likely vowel confusions for speakers of different L1s. The training program focuses on one distinction at a time, and adjusts to concentrate on particular contexts or exemplars that are difficult for the individual subject. In the current study, 52 subjects participated in testing and 2 subjects participated in training. In the testing portion, results indicate that the L1 and the fluency level in English, as well as individual variability, have an effect on perceptual ability. In the training portion, subjects showed significant improvement on the contrasts on which they trained. Because these programs make extensive data collection over large populations and large distances easy, this method of research will facilitate further investigation of questions regarding second language acquisition.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132051206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A very low bit rate speech coder using HMM with speaker adaptation 一个非常低比特率的语音编码器使用HMM与说话人自适应
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-375
T. Masuko, K. Tokuda, Takao Kobayashi
This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.
本文介绍了一种基于HMM的语音声码器的说话人自适应技术。在声码器中,编码器进行音位识别并将音位索引和状态持续时间传输给解码器,解码器使用基于hmm的语音合成技术合成语音。这种声编码器的主要问题之一是合成语音的语音特性取决于解码器中使用的hmm,因此无论输入扬声器的种类如何,语音特性都是固定的。为了克服这一问题,我们通过传输传输向量、输入语音和hmm之间不匹配的信息,使hmm适应输入语音。主观测试结果表明,不进行传递矢量量化的声码器的性能与依赖于说话人的声码器相当。
{"title":"A very low bit rate speech coder using HMM with speaker adaptation","authors":"T. Masuko, K. Tokuda, Takao Kobayashi","doi":"10.21437/ICSLP.1998-375","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-375","url":null,"abstract":"This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132208532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A four layer sharing HMM system for very large vocabulary isolated word recognition 一个四层共享HMM系统,用于非常大词汇量的孤立词识别
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-284
Ruxin Chen, Miyuki Tanaka, Duanpei Wu, L. Olorenshaw, Mariscela Amador
This paper reports on a large vocabulary speaker independent isolated word recognizer targeting 50,000 words. The system supports a unique four-layer sharing structure for either continuous HMM or discrete HMM. Evaluation is performed using a dictionary of 5000 US city names, a dictionary of the 5000 English most frequent words, a dictionary of 50,000 English words, and the 110,000 word CMU English dictionary. For these dictionaries, recognition accuracy ranges from 90% to 93% for the top 3 results.
本文报道了一种针对5万个单词的大词汇量独立于说话人的孤立词识别器。系统支持独特的四层共享结构,无论是连续HMM还是离散HMM。使用包含5000个美国城市名称的字典、包含5000个英语最常见单词的字典、包含50,000个英语单词的字典和包含110,000个单词的CMU英语字典执行评估。对于这些词典,前3个结果的识别准确率在90%到93%之间。
{"title":"A four layer sharing HMM system for very large vocabulary isolated word recognition","authors":"Ruxin Chen, Miyuki Tanaka, Duanpei Wu, L. Olorenshaw, Mariscela Amador","doi":"10.21437/ICSLP.1998-284","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-284","url":null,"abstract":"This paper reports on a large vocabulary speaker independent isolated word recognizer targeting 50,000 words. The system supports a unique four-layer sharing structure for either continuous HMM or discrete HMM. Evaluation is performed using a dictionary of 5000 US city names, a dictionary of the 5000 English most frequent words, a dictionary of 50,000 English words, and the 110,000 word CMU English dictionary. For these dictionaries, recognition accuracy ranges from 90% to 93% for the top 3 results.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132355017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An efficient mel-LPC analysis method for speech recognition 语音识别中一种高效的mel-LPC分析方法
Pub Date : 1998-11-30 DOI: 10.21437/ICSLP.1998-536
H. Matsumoto, Y. Nakatoh, Y. Furuhata
This paper proposes a simple and e(cid:14)cient time domain technique to estimate an all-poll model on a mel-frequency axis (Mel-LPC). This method requires only two-fold computational cost as compared to conventional linear prediction analysis. The recognition performance of mel-cepstral parameters obtained by the Mel LPC analysis is compared with those of conventional LP mel-cepstra and the mel-frequency cepstrum coe(cid:14)cients (MFCC) through gender-dependent phoneme and word recognition tests. The results show that the Mel-LPC cepstrum attains a signi(cid:12)cant improvement in recognition accuracy over conventional LP mel-cepstrum, and gives slightly higher accuracy for male speakersand slightlylower accuracy for female speakersthan MFCC.
本文提出了一种简单且e(cid:14)客户端的时域技术来估计mel-频轴(Mel-LPC)上的全轮询模型。与传统的线性预测分析相比,该方法只需要两倍的计算成本。通过性别依赖的音素和单词识别测试,比较了Mel LPC分析获得的Mel -倒谱参数与传统LP Mel -倒谱和Mel -频率倒谱coe(cid:14)客户(MFCC)的识别性能。结果表明,Mel-LPC倒频谱的识别精度比传统的LP倒频谱有显著提高(cid:12),对男性说话人的识别精度略高,对女性说话人的识别精度略低于MFCC。
{"title":"An efficient mel-LPC analysis method for speech recognition","authors":"H. Matsumoto, Y. Nakatoh, Y. Furuhata","doi":"10.21437/ICSLP.1998-536","DOIUrl":"https://doi.org/10.21437/ICSLP.1998-536","url":null,"abstract":"This paper proposes a simple and e(cid:14)cient time domain technique to estimate an all-poll model on a mel-frequency axis (Mel-LPC). This method requires only two-fold computational cost as compared to conventional linear prediction analysis. The recognition performance of mel-cepstral parameters obtained by the Mel LPC analysis is compared with those of conventional LP mel-cepstra and the mel-frequency cepstrum coe(cid:14)cients (MFCC) through gender-dependent phoneme and word recognition tests. The results show that the Mel-LPC cepstrum attains a signi(cid:12)cant improvement in recognition accuracy over conventional LP mel-cepstrum, and gives slightly higher accuracy for male speakersand slightlylower accuracy for female speakersthan MFCC.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132403918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
5th International Conference on Spoken Language Processing (ICSLP 1998)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1