首页 > 最新文献

IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.最新文献

英文 中文
Example-based query generation for spontaneous speech 基于示例的自发语音查询生成
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034639
H. Murao, Nobuo Kawaguchi, S. Matsubara, Y. Inagaki
This paper proposes a new query generation method that is based on examples of human-to-human dialogue. Along with modeling the information flow in dialogue, a system for information retrieval in-car has been designed. The system refers to the dialogue corpus to find an example that is similar to input speech, and makes a query from the example. We also give the experimental results to show the effectiveness of this method.
本文提出了一种基于人对人对话实例的查询生成方法。在对对话信息流建模的基础上,设计了车载信息检索系统。系统参考对话语料库查找与输入语音相似的示例,并从示例中进行查询。实验结果表明了该方法的有效性。
{"title":"Example-based query generation for spontaneous speech","authors":"H. Murao, Nobuo Kawaguchi, S. Matsubara, Y. Inagaki","doi":"10.1109/ASRU.2001.1034639","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034639","url":null,"abstract":"This paper proposes a new query generation method that is based on examples of human-to-human dialogue. Along with modeling the information flow in dialogue, a system for information retrieval in-car has been designed. The system refers to the dialogue corpus to find an example that is similar to input speech, and makes a query from the example. We also give the experimental results to show the effectiveness of this method.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"47 5-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114035589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Error analysis using decision trees in spontaneous presentation speech recognition 基于决策树的自发呈现语音识别错误分析
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034621
T. Shinozaki, S. Furui
This paper proposes the use of decision trees for analyzing errors in spontaneous presentation speech recognition. The trees are designed to predict whether a word or a phoneme can be correctly recognized or not, using word or phoneme attributes as inputs. The trees, are constructed using training "cases" by choosing questions about attributes step by step according to the gain ratio criterion. The errors in recognizing spontaneous presentations given by 10 male speakers were analyzed, and the explanation capability of attributes for the recognition errors was quantitatively evaluated. A restricted set of attributes closely related to the recognition errors was identified for both words and phonemes.
本文提出了一种基于决策树的语音识别错误分析方法。这些树被设计用来预测一个单词或音素是否可以被正确识别,使用单词或音素属性作为输入。根据增益比准则逐步选择有关属性的问题,使用训练“案例”构建树。分析了10名男性演讲者即兴演讲的识别错误,定量评价了识别错误的属性解释能力。一组与识别错误密切相关的有限属性被识别为单词和音素。
{"title":"Error analysis using decision trees in spontaneous presentation speech recognition","authors":"T. Shinozaki, S. Furui","doi":"10.1109/ASRU.2001.1034621","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034621","url":null,"abstract":"This paper proposes the use of decision trees for analyzing errors in spontaneous presentation speech recognition. The trees are designed to predict whether a word or a phoneme can be correctly recognized or not, using word or phoneme attributes as inputs. The trees, are constructed using training \"cases\" by choosing questions about attributes step by step according to the gain ratio criterion. The errors in recognizing spontaneous presentations given by 10 male speakers were analyzed, and the explanation capability of attributes for the recognition errors was quantitatively evaluated. A restricted set of attributes closely related to the recognition errors was identified for both words and phonemes.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122702778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Beyond the Informedia digital video library: video and audio analysis for remembering conversations 超越信息媒体数字视频库:视频和音频分析记忆对话
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034646
Alexander Hauptmann, Wei-Hao Lin
The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: "That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.
“信息媒体计划”数字视频库开创了电视广播新闻自动分析和按需检索的先河。在这个系统的基础上,我们开发了一种可穿戴的、个性化的信息媒体系统,它可以监听和记录佩戴者的对话,识别当前对话伙伴的脸,并记住他/她的声音。当系统下次看到同一个人的脸并听到同样的声音时,它可以从上次对话中检索音频,以压缩的形式重播被提到的名字和主要问题。所有这一切都发生在不显眼的地方,有点像一个智能助手低声对你说:“这是科技解决方案公司的鲍勃·琼斯;两周前在伦敦你们讨论了太阳能电池板。”本文概述了一般的系统组件以及接口注意事项。初步实现表明,人脸识别方法和说话人识别技术都有严重的不足,必须克服。
{"title":"Beyond the Informedia digital video library: video and audio analysis for remembering conversations","authors":"Alexander Hauptmann, Wei-Hao Lin","doi":"10.1109/ASRU.2001.1034646","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034646","url":null,"abstract":"The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: \"That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels\". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131517073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Robust and efficient confidence measure for isolated command recognition 隔离命令识别的鲁棒高效置信度方法
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034681
G. Hernández-Abrego, X. Menéndez-Pidal, L. Olorenshaw
A new confidence measure for isolated command recognition is presented. It is versatile and efficient in two ways. First, it is based exclusively on the speech recognizer's output. In addition, it is robust to changes in the vocabulary, acoustic model and parameter settings. Its calculation is very simple and it is based on the computation of a pseudo-filler score from an N-best list. Performance is tested in two different command recognition applications. Finally, it is efficient to separate correct results both from incorrect ones and from false alarms caused by out-of-vocabulary elements and noise.
提出了一种新的孤立命令识别置信度测度。它在两个方面是通用和高效的。首先,它完全基于语音识别器的输出。此外,它对词汇、声学模型和参数设置的变化具有鲁棒性。它的计算非常简单,它基于从n个最佳列表中计算一个伪填充分数。性能测试在两个不同的命令识别应用程序。最后,将正确的结果与不正确的结果以及由词汇外元素和噪声引起的假警报分开是有效的。
{"title":"Robust and efficient confidence measure for isolated command recognition","authors":"G. Hernández-Abrego, X. Menéndez-Pidal, L. Olorenshaw","doi":"10.1109/ASRU.2001.1034681","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034681","url":null,"abstract":"A new confidence measure for isolated command recognition is presented. It is versatile and efficient in two ways. First, it is based exclusively on the speech recognizer's output. In addition, it is robust to changes in the vocabulary, acoustic model and parameter settings. Its calculation is very simple and it is based on the computation of a pseudo-filler score from an N-best list. Performance is tested in two different command recognition applications. Finally, it is efficient to separate correct results both from incorrect ones and from false alarms caused by out-of-vocabulary elements and noise.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133805377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Comparison of standard and hybrid modeling techniques for distributed speech recognition 分布式语音识别的标准和混合建模技术比较
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034608
J. Stadermann, G. Rigoll
Distributed speech recognition (DSR) is an interesting technology for mobile recognition tasks where the recognizer is split up into two parts and connected by a transmission channel. We compare the performance of standard and hybrid modeling approaches in this environment. The evaluation is done on clean and noisy speech samples taken from the TI digits and the Aurora databases. Our results show that, for this task, the hybrid modeling techniques can outperform standard continuous systems.
分布式语音识别(DSR)是一种用于移动识别任务的有趣技术,它将识别器分成两个部分,并通过传输通道连接起来。我们比较了标准和混合建模方法在这种环境下的性能。评估是在从TI数字和Aurora数据库中获取的干净和有噪声的语音样本上完成的。我们的研究结果表明,对于这项任务,混合建模技术可以优于标准连续系统。
{"title":"Comparison of standard and hybrid modeling techniques for distributed speech recognition","authors":"J. Stadermann, G. Rigoll","doi":"10.1109/ASRU.2001.1034608","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034608","url":null,"abstract":"Distributed speech recognition (DSR) is an interesting technology for mobile recognition tasks where the recognizer is split up into two parts and connected by a transmission channel. We compare the performance of standard and hybrid modeling approaches in this environment. The evaluation is done on clean and noisy speech samples taken from the TI digits and the Aurora databases. Our results show that, for this task, the hybrid modeling techniques can outperform standard continuous systems.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133857897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improvement of non-negative matrix factorization based language model using exponential models 基于指数模型的非负矩阵分解语言模型的改进
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034619
M. Novak, R. Mammone
This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition. This modeling technique borrows the basic idea from latent semantic analysis (LSA), which is typically used in information retrieval. An improvement was achieved when exponential models were used to estimate the a posteriori topic probabilities for an observed history. This method improved the perplexity of the NMF model, resulting in a 24% perplexity improvement overall when compared to a trigram language model.
本文描述了使用指数模型来改进基于非负矩阵分解(NMF)的主题语言模型,用于自动语音识别。这种建模技术借鉴了潜在语义分析(LSA)的基本思想,这种方法通常用于信息检索。当使用指数模型来估计观察历史的后验主题概率时,取得了改进。该方法改善了NMF模型的困惑度,与三元语言模型相比,总体上困惑度提高了24%。
{"title":"Improvement of non-negative matrix factorization based language model using exponential models","authors":"M. Novak, R. Mammone","doi":"10.1109/ASRU.2001.1034619","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034619","url":null,"abstract":"This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition. This modeling technique borrows the basic idea from latent semantic analysis (LSA), which is typically used in information retrieval. An improvement was achieved when exponential models were used to estimate the a posteriori topic probabilities for an observed history. This method improved the perplexity of the NMF model, resulting in a 24% perplexity improvement overall when compared to a trigram language model.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124884836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Automatic evaluation methods of a speech translation system's capability 语音翻译系统能力的自动评价方法
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034661
F. Sugaya, K. Yasuda, T. Takezawa, S. Yamamoto
The main goal of the paper is to propose automatic schemes for the translation paired comparison method, which was proposed by the authors to evaluate precisely a speech translation system's capability. In the method, the outputs of the speech translation system are subjectively compared with the results of native Japanese taking the Test of English for International Communication (TOEIC), which is used as a measure of a person's speech translation capability. Experiments are conducted on TDMT, which is a subsystem of the Japanese-to-English speech translation system ATR-MATRIX developed at ATR Interpreting Telecommunications Research Laboratories. The winning rate of TDMT shows a good correlation with the TOEIC scores of the examinees. A regression analysis on the subjective results shows that the translation capability of TDMT matches a person scoring around 700 on the TOEIC. The automatic evaluation methods use DP-based similarity, which is calculated by DP distances between a translation output and multiple translation answers. The answers are collected by two methods: paraphrasing and query from a parallel corpus. In both types of collection, the similarity shows the same good correlation with the TOEIC scores of the examinees as the subjective winning rate. Regression analysis using similarity shows that the system's matched point is around 750. We also show effects of paraphrased data.
本文的主要目标是提出翻译配对比对方法的自动方案,该方法可以精确地评估语音翻译系统的能力。在该方法中,将语音翻译系统的输出与日本人参加国际交流英语考试(TOEIC)的结果进行主观上的比较,以衡量一个人的语音翻译能力。TDMT是ATR口译电信研究实验室开发的日英语音翻译系统ATR- matrix的一个子系统。TDMT的中奖率与考生的托业成绩有很好的相关性。对主观结果的回归分析表明,TDMT的翻译能力与托业700分左右的人相当。自动评价方法使用基于DP的相似度,通过翻译输出和多个翻译答案之间的DP距离计算。通过两种方法收集答案:从平行语料库中改写和查询。在这两种类型的集合中,相似性与考生的托业成绩表现出与主观中标率同样良好的相关性。使用相似度进行回归分析,系统的匹配点在750左右。我们还展示了改写数据的效果。
{"title":"Automatic evaluation methods of a speech translation system's capability","authors":"F. Sugaya, K. Yasuda, T. Takezawa, S. Yamamoto","doi":"10.1109/ASRU.2001.1034661","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034661","url":null,"abstract":"The main goal of the paper is to propose automatic schemes for the translation paired comparison method, which was proposed by the authors to evaluate precisely a speech translation system's capability. In the method, the outputs of the speech translation system are subjectively compared with the results of native Japanese taking the Test of English for International Communication (TOEIC), which is used as a measure of a person's speech translation capability. Experiments are conducted on TDMT, which is a subsystem of the Japanese-to-English speech translation system ATR-MATRIX developed at ATR Interpreting Telecommunications Research Laboratories. The winning rate of TDMT shows a good correlation with the TOEIC scores of the examinees. A regression analysis on the subjective results shows that the translation capability of TDMT matches a person scoring around 700 on the TOEIC. The automatic evaluation methods use DP-based similarity, which is calculated by DP distances between a translation output and multiple translation answers. The answers are collected by two methods: paraphrasing and query from a parallel corpus. In both types of collection, the similarity shows the same good correlation with the TOEIC scores of the examinees as the subjective winning rate. Regression analysis using similarity shows that the system's matched point is around 750. We also show effects of paraphrased data.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134325812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vocabulary independent speech recognition using particles 词汇独立语音识别使用粒子
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034650
E. Whittaker, J.M. Van Thong, P. Moreno
A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best hypothesis from the particle speech recognizer is expanded into a list of phonetically similar word candidates using a phone confusion matrix. The resulting word graph is then re-decoded using a word language model to produce the final word hypothesis. Preliminary results on the DARPA HUB4 97 and 98 evaluation sets using word bigram redecoding of the particle hypothesis show a WER of between 2.2% and 2.9% higher than using a word bigram speech recognizer of comparable complexity. The method has potential applications in spoken document retrieval for recovering out-of-vocabulary words and also in client-server based speech recognition.
提出了一种不依赖于固定词汇表的语音识别方法。在语音识别系统中,粒子作为识别单元,实现了独立于单词词汇的语音解码。一个粒子代表一个连接的电话序列。在粒子语音识别器的单最佳假设中,代表单词的每个粒子串都使用电话混淆矩阵扩展为语音相似的候选单词列表。然后使用单词语言模型对生成的单词图进行重新解码,从而产生最终的单词假设。在DARPA hub497和98评估集上,使用粒子假设的词重图重新解码的初步结果表明,与使用相同复杂性的词重图语音识别器相比,WER高出2.2%至2.9%。该方法在口语文档检索和基于客户端-服务器的语音识别中具有潜在的应用前景。
{"title":"Vocabulary independent speech recognition using particles","authors":"E. Whittaker, J.M. Van Thong, P. Moreno","doi":"10.1109/ASRU.2001.1034650","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034650","url":null,"abstract":"A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best hypothesis from the particle speech recognizer is expanded into a list of phonetically similar word candidates using a phone confusion matrix. The resulting word graph is then re-decoded using a word language model to produce the final word hypothesis. Preliminary results on the DARPA HUB4 97 and 98 evaluation sets using word bigram redecoding of the particle hypothesis show a WER of between 2.2% and 2.9% higher than using a word bigram speech recognizer of comparable complexity. The method has potential applications in spoken document retrieval for recovering out-of-vocabulary words and also in client-server based speech recognition.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"232 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134326254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A comparative study of model-based adaptation techniques for a compact speech recognizer 紧凑型语音识别器基于模型的自适应技术比较研究
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034581
F. Thiele, R. Bippus
Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a posteriori) and MLLR (maximum likelihood linear regression) adaptation. Experimental results are reported for unsupervised on-line adaptation on different amounts of adaptation data ranging from 4 to 500 words per speaker. The results show that for small amounts of adaptation data it might be more efficient to employ a larger baseline acoustic model without adaptation. Eigenvoices achieve the lowest word error rates of all adaptation techniques but SMAP presents a good compromise between memory requirement and accuracy.
许多说话人自适应技术已经成功地应用于语音自动识别中。本文从记忆需求和处理需求两方面比较了几种自适应方法的性能。为了适应具有4k密度的紧凑声学模型,除了众所周知的MAP(最大后验)和MLLR(最大似然线性回归)自适应技术外,还研究了特征声和结构MAP (SMAP)。本文报道了在每个说话者4 ~ 500个单词的不同适应数据量上的无监督在线适应实验结果。结果表明,对于少量的适应数据,采用较大的基线声学模型而不进行适应可能会更有效。在所有自适应技术中,特征语音的错误率最低,而SMAP在记忆要求和准确性之间取得了很好的折衷。
{"title":"A comparative study of model-based adaptation techniques for a compact speech recognizer","authors":"F. Thiele, R. Bippus","doi":"10.1109/ASRU.2001.1034581","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034581","url":null,"abstract":"Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a posteriori) and MLLR (maximum likelihood linear regression) adaptation. Experimental results are reported for unsupervised on-line adaptation on different amounts of adaptation data ranging from 4 to 500 words per speaker. The results show that for small amounts of adaptation data it might be more efficient to employ a larger baseline acoustic model without adaptation. Eigenvoices achieve the lowest word error rates of all adaptation techniques but SMAP presents a good compromise between memory requirement and accuracy.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114219640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Introduction of speech interface for mobile information services 移动信息服务语音接口介绍
Pub Date : 2001-12-09 DOI: 10.1109/ASRU.2001.1034684
H. Nakano
Popular Japanese mobile Web-phones are widely used to connect to Internet providers (IP). The most popular service on mobile Web-phones is E-mail. Currently, users type the messages using the ten standard keys on the phone. Several letters and Kana (Japanese phonetic characters) are assigned to each key, and the user steps through them by tapping the key repeatedly. After inputting several words, the user converts them into Kanji (Chinese character). Kana-Kanji conversion is still improving, and recently fast text input methods have been introduced, but these key input methods are still troublesome. A speech interface is expected to overcome this input difficulty. However, speech interfaces suffer several problems, both technical and social. The paper summarises these problems and looks at some methods by which technical solutions may be found.
流行的日本移动网络电话被广泛用于连接到互联网提供商(IP)。移动网络电话上最流行的服务是电子邮件。目前,用户使用手机上的十个标准按键来输入信息。每个键都有几个字母和假名(日语的音标),用户通过反复敲击这些键来完成操作。用户输入几个单词后,将其转换为汉字。假名和汉字的转换仍在改进中,最近还引入了快速文本输入法,但这些关键输入法仍然很麻烦。语音界面有望克服这一输入困难。然而,语音接口在技术上和社会上都存在一些问题。本文总结了这些问题,并探讨了找到技术解决方案的一些方法。
{"title":"Introduction of speech interface for mobile information services","authors":"H. Nakano","doi":"10.1109/ASRU.2001.1034684","DOIUrl":"https://doi.org/10.1109/ASRU.2001.1034684","url":null,"abstract":"Popular Japanese mobile Web-phones are widely used to connect to Internet providers (IP). The most popular service on mobile Web-phones is E-mail. Currently, users type the messages using the ten standard keys on the phone. Several letters and Kana (Japanese phonetic characters) are assigned to each key, and the user steps through them by tapping the key repeatedly. After inputting several words, the user converts them into Kanji (Chinese character). Kana-Kanji conversion is still improving, and recently fast text input methods have been introduced, but these key input methods are still troublesome. A speech interface is expected to overcome this input difficulty. However, speech interfaces suffer several problems, both technical and social. The paper summarises these problems and looks at some methods by which technical solutions may be found.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115548766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1