首页 > 最新文献

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

英文 中文
A language modeling approach to question answering on speech transcripts 语音答疑的语言建模方法
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430112
Matthias H. Heie, E. Whittaker, Josef R. Novak, S. Furui
This paper presents a language modeling approach to sentence retrieval for Question Answering (QA) that we used in Question Answering on speech transcripts (QAst), a pilot task at the Cross Language Evaluation Forum (CLEF) evaluations 2007. A language model (LM) is generated for each sentence and these models are combined with document LMs to take advantage of contextual information. A query expansion technique using class models is proposed and included in our framework. Finally, our method's impact on exact answer extraction is evaluated. We show that combining sentence LMs with document LMs significantly improves sentence retrieval performance, and that this sentence retrieval approach leads to better answer extraction performance.
本文提出了一种用于问答(QA)句子检索的语言建模方法,我们将其用于2007年跨语言评估论坛(CLEF)评估的试点任务——语音文本问答(QAst)。为每个句子生成语言模型(LM),并将这些模型与文档LM相结合,以利用上下文信息。提出了一种使用类模型的查询扩展技术,并将其包含在我们的框架中。最后,评估了我们的方法对精确答案提取的影响。我们的研究表明,将句子LMs与文档LMs相结合可以显著提高句子检索性能,并且这种句子检索方法可以获得更好的答案提取性能。
{"title":"A language modeling approach to question answering on speech transcripts","authors":"Matthias H. Heie, E. Whittaker, Josef R. Novak, S. Furui","doi":"10.1109/ASRU.2007.4430112","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430112","url":null,"abstract":"This paper presents a language modeling approach to sentence retrieval for Question Answering (QA) that we used in Question Answering on speech transcripts (QAst), a pilot task at the Cross Language Evaluation Forum (CLEF) evaluations 2007. A language model (LM) is generated for each sentence and these models are combined with document LMs to take advantage of contextual information. A query expansion technique using class models is proposed and included in our framework. Finally, our method's impact on exact answer extraction is evaluated. We show that combining sentence LMs with document LMs significantly improves sentence retrieval performance, and that this sentence retrieval approach leads to better answer extraction performance.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121491428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Call classification for automated troubleshooting on large corpora 呼叫分类用于大型语料库的自动故障排除
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430110
Keelan Evanini, David Suendermann-Oeft, R. Pieraccini
This paper compares six algorithms for call classification in the framework of a dialog system for automated troubleshooting. The comparison is carried out on large datasets, each consisting of over 100,000 utterances from two domains: television (TV) and Internet (INT). In spite of the high number of classes (79 for TV and 58 for INT), the best classifier (maximum entropy on word bigrams) achieved more than 77% classification accuracy on the TV dataset and 81% on the INT dataset.
本文比较了在自动故障排除对话系统框架下的6种呼叫分类算法。比较是在大型数据集上进行的,每个数据集由来自两个领域的10万多个话语组成:电视(TV)和互联网(INT)。尽管有大量的分类(TV为79个,INT为58个),但最好的分类器(词双元图的最大熵)在TV数据集上实现了77%以上的分类准确率,在INT数据集上实现了81%以上的分类准确率。
{"title":"Call classification for automated troubleshooting on large corpora","authors":"Keelan Evanini, David Suendermann-Oeft, R. Pieraccini","doi":"10.1109/ASRU.2007.4430110","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430110","url":null,"abstract":"This paper compares six algorithms for call classification in the framework of a dialog system for automated troubleshooting. The comparison is carried out on large datasets, each consisting of over 100,000 utterances from two domains: television (TV) and Internet (INT). In spite of the high number of classes (79 for TV and 58 for INT), the best classifier (maximum entropy on word bigrams) achieved more than 77% classification accuracy on the TV dataset and 81% on the INT dataset.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121623531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Variational Kullback-Leibler divergence for Hidden Markov models 隐马尔可夫模型的变分Kullback-Leibler散度
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430132
J. Hershey, P. Olsen, Steven J. Rennie
Divergence measures are widely used tools in statistics and pattern recognition. The Kullback-Leibler (KL) divergence between two hidden Markov models (HMMs) would be particularly useful in the fields of speech and image recognition. Whereas the KL divergence is tractable for many distributions, including Gaussians, it is not in general tractable for mixture models or HMMs. Recently, variational approximations have been introduced to efficiently compute the KL divergence and Bhattacharyya divergence between two mixture models, by reducing them to the divergences between the mixture components. Here we generalize these techniques to approach the divergence between HMMs using a recursive backward algorithm. Two such methods are introduced, one of which yields an upper bound on the KL divergence, the other of which yields a recursive closed-form solution. The KL and Bhattacharyya divergences, as well as a weighted edit-distance technique, are evaluated for the task of predicting the confusability of pairs of words.
散度度量是统计学和模式识别中广泛使用的工具。两个隐马尔可夫模型(hmm)之间的Kullback-Leibler (KL)散度在语音和图像识别领域特别有用。尽管KL散度对于许多分布(包括高斯分布)是可处理的,但对于混合模型或hmm来说,它通常是不可处理的。最近,变分近似被引入到有效地计算两个混合模型之间的KL散度和Bhattacharyya散度,通过将它们简化为混合组分之间的散度。在这里,我们推广这些技术,使用递归后向算法来处理hmm之间的散度。本文介绍了两种这样的方法,其中一种方法给出了KL散度的上界,另一种方法给出了一个递归的闭形式解。KL和Bhattacharyya分歧,以及加权编辑距离技术,用于预测单词对的混淆性的任务进行评估。
{"title":"Variational Kullback-Leibler divergence for Hidden Markov models","authors":"J. Hershey, P. Olsen, Steven J. Rennie","doi":"10.1109/ASRU.2007.4430132","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430132","url":null,"abstract":"Divergence measures are widely used tools in statistics and pattern recognition. The Kullback-Leibler (KL) divergence between two hidden Markov models (HMMs) would be particularly useful in the fields of speech and image recognition. Whereas the KL divergence is tractable for many distributions, including Gaussians, it is not in general tractable for mixture models or HMMs. Recently, variational approximations have been introduced to efficiently compute the KL divergence and Bhattacharyya divergence between two mixture models, by reducing them to the divergences between the mixture components. Here we generalize these techniques to approach the divergence between HMMs using a recursive backward algorithm. Two such methods are introduced, one of which yields an upper bound on the KL divergence, the other of which yields a recursive closed-form solution. The KL and Bhattacharyya divergences, as well as a weighted edit-distance technique, are evaluated for the task of predicting the confusability of pairs of words.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124385471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Lattice-based Viterbi decoding techniques for speech translation 基于点阵的语音翻译Viterbi解码技术
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430143
G. Saon, M. Picheny
We describe a cardinal-synchronous Viterbi decoder for statistical phrase-based machine translation which can operate on general ASR lattices (as opposed to confusion networks). The decoder implements constrained source reordering on the input lattice and makes use of an outbound distortion model to score the possible reorderings. The phrase table, representing the decoding search space, is encoded as a weighted finite state acceptor which is determined and minimized. At a high level, the search proceeds by performing simultaneous transitions in two pairs of automata: (input lattice, phrase table FSM) and (phrase table FSM, target language model). An alternative decoding strategy that we explore is to break the search into two independent subproblems: first, we perform monotone lattice decoding and find the best foreign path through the ASR lattice and then, we decode this path with reordering using standard sentence-based SMT. We report experimental results on several testsets of a large scale Arabic-to-English speech translation task in the context of the global autonomous language exploitation (or GALE) DARPA project. The results indicate that, for monotone search, lattice-based decoding outperforms 1-best decoding whereas for search with reordering, only the second decoding strategy was found to be superior to 1-best decoding. In both cases, the improvements hold only for shallow lattices.
我们描述了一个基于统计短语的机器翻译的基数同步维特比解码器,它可以在一般的ASR格上运行(与混淆网络相反)。解码器在输入格上实现约束源重排序,并利用出站失真模型对可能的重排序进行评分。表示解码搜索空间的短语表被编码为一个加权的有限状态接受体,该有限状态接受体被确定并最小化。在高层次上,搜索通过在两对自动机中同时执行转换来进行:(输入格,短语表FSM)和(短语表FSM,目标语言模型)。我们探索的另一种解码策略是将搜索分解为两个独立的子问题:首先,我们执行单调格解码并通过ASR格找到最佳外部路径,然后,我们使用基于标准句子的SMT对该路径进行重新排序解码。我们报告了在全球自主语言开发(GALE) DARPA项目背景下的大规模阿拉伯语到英语语音翻译任务的几个测试集的实验结果。结果表明,对于单调搜索,基于格的解码优于1-best解码,而对于重排搜索,只有第二种解码策略优于1-best解码。在这两种情况下,改进只适用于浅格子。
{"title":"Lattice-based Viterbi decoding techniques for speech translation","authors":"G. Saon, M. Picheny","doi":"10.1109/ASRU.2007.4430143","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430143","url":null,"abstract":"We describe a cardinal-synchronous Viterbi decoder for statistical phrase-based machine translation which can operate on general ASR lattices (as opposed to confusion networks). The decoder implements constrained source reordering on the input lattice and makes use of an outbound distortion model to score the possible reorderings. The phrase table, representing the decoding search space, is encoded as a weighted finite state acceptor which is determined and minimized. At a high level, the search proceeds by performing simultaneous transitions in two pairs of automata: (input lattice, phrase table FSM) and (phrase table FSM, target language model). An alternative decoding strategy that we explore is to break the search into two independent subproblems: first, we perform monotone lattice decoding and find the best foreign path through the ASR lattice and then, we decode this path with reordering using standard sentence-based SMT. We report experimental results on several testsets of a large scale Arabic-to-English speech translation task in the context of the global autonomous language exploitation (or GALE) DARPA project. The results indicate that, for monotone search, lattice-based decoding outperforms 1-best decoding whereas for search with reordering, only the second decoding strategy was found to be superior to 1-best decoding. In both cases, the improvements hold only for shallow lattices.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133681294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Discriminative training of multi-state barge-in models 多状态驳船模型的判别训练
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430137
A. Ljolje, Vincent Goffin
A barge-in system designed to reflect the design of the acoustic model used in commercial applications has been built and evaluated. It uses standard hidden Markov model structures, cepstral features and multiple hidden Markov models for both the speech and non-speech parts of the model. It is tested on a large number of real-world databases using noisy speech onset positions which were determined by forced alignment of lexical transcriptions with the recognition model. The ML trained model achieves low false rejection rates at the expense of high false acceptance rates. The discriminative training using the modified algorithm based on the maximum mutual information criterion reduces the false acceptance rates by a half, while preserving the low false rejection rates. Combining an energy based voice activity detector with the hidden Markov model based barge-in models achieves the best performance.
为了反映商业应用中声学模型的设计,已经建立并评估了一个驳船系统。它使用标准的隐马尔可夫模型结构、倒谱特征和多个隐马尔可夫模型来处理模型的语音和非语音部分。它在大量真实世界的数据库上进行了测试,使用嘈杂的语音起始位置,这些位置是通过与识别模型强制对齐词法转录来确定的。机器学习训练的模型以高错误接受率为代价实现了低错误拒绝率。采用基于最大互信息准则的改进算法进行判别训练,在保持低误拒率的同时,将误接受率降低了一半。将基于能量的语音活动检测器与基于隐马尔可夫模型的驳船模型相结合,可以获得最佳性能。
{"title":"Discriminative training of multi-state barge-in models","authors":"A. Ljolje, Vincent Goffin","doi":"10.1109/ASRU.2007.4430137","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430137","url":null,"abstract":"A barge-in system designed to reflect the design of the acoustic model used in commercial applications has been built and evaluated. It uses standard hidden Markov model structures, cepstral features and multiple hidden Markov models for both the speech and non-speech parts of the model. It is tested on a large number of real-world databases using noisy speech onset positions which were determined by forced alignment of lexical transcriptions with the recognition model. The ML trained model achieves low false rejection rates at the expense of high false acceptance rates. The discriminative training using the modified algorithm based on the maximum mutual information criterion reduces the false acceptance rates by a half, while preserving the low false rejection rates. Combining an energy based voice activity detector with the hidden Markov model based barge-in models achieves the best performance.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133540383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Comparing one and two-stage acoustic modeling in the recognition of emotion in speech 语音情感识别中一级和两级声学建模的比较
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430180
Björn Schuller, Bogdan Vlasenko, Ricardo Minguez, G. Rigoll, A. Wendemuth
In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.
在寻找用于识别言语情感的标准单位时,一个完整的回合,即一个人在谈话中的完整部分,是很常见的。在应用程序中,这种转变通常看起来是有利的。然而,子轮实体的高效率是众所周知的。在这方面,研究了一种两阶段的方法,通过根据声学特性对语音转向进行分块,以及在单个块分析后进行多实例学习进行转向映射,来提供更高的时间分辨率。采用基于MFCC的单次维特比波束搜索和令牌传递将快速预分割成情感准平稳段。块分析是通过暴力大特征空间构建和随后的子集选择、SVM分类和说话人归一化来实现的。广泛的测试揭示了与单阶段处理相比的差异。或者,音节被用于分组。
{"title":"Comparing one and two-stage acoustic modeling in the recognition of emotion in speech","authors":"Björn Schuller, Bogdan Vlasenko, Ricardo Minguez, G. Rigoll, A. Wendemuth","doi":"10.1109/ASRU.2007.4430180","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430180","url":null,"abstract":"In the search for a standard unit for use in recognition of emotion in speech, a whole turn, that is the full section of speech by one person in a conversation, is common. Within applications such turns often seem favorable. Yet, high effectiveness of sub-turn entities is known. In this respect a two-stage approach is investigated to provide higher temporal resolution by chunking of speech-turns according to acoustic properties, and multi-instance learning for turn-mapping after individual chunk analysis. For chunking fast pre-segmentation into emotionally quasi-stationary segments by one-pass Viterbi beam search with token passing basing on MFCC is used. Chunk analysis is realized by brute-force large feature space construction with subsequent subset selection, SVM classification, and speaker normalization. Extensive tests reveal differences compared to one-stage processing. Alternatively, syllables are used for chunking.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Advances in Arabic broadcast news transcription at RWTH 工业大学阿拉伯语广播新闻转录的进展
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430154
David Rybach, Stefan Hahn, C. Gollan, R. Schlüter, H. Ney
This paper describes the RWTH speech recognition system for Arabic. Several design aspects of the system, including cross-adaptation, multiple system design and combination, are analyzed. We summarize the semi-automatic lexicon generation for Arabic using a statistical approach to grapheme-to-phoneme conversion and pronunciation statistics. Furthermore, a novel ASR-based audio segmentation algorithm is presented. Finally, we discuss practical approaches for parallelized acoustic training and memory efficient lattice rescoring. Systematic results are reported on recent GALE evaluation corpora.
本文介绍了RWTH阿拉伯语语音识别系统。分析了系统的交叉适应、多系统设计和组合等几个设计方面的问题。我们总结了阿拉伯语半自动词汇生成使用的统计方法,字形音素转换和发音统计。在此基础上,提出了一种新的基于asr的音频分割算法。最后,我们讨论了并行声学训练和高效记忆点阵重记的实际方法。系统地报道了最近的GALE评价语料库的结果。
{"title":"Advances in Arabic broadcast news transcription at RWTH","authors":"David Rybach, Stefan Hahn, C. Gollan, R. Schlüter, H. Ney","doi":"10.1109/ASRU.2007.4430154","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430154","url":null,"abstract":"This paper describes the RWTH speech recognition system for Arabic. Several design aspects of the system, including cross-adaptation, multiple system design and combination, are analyzed. We summarize the semi-automatic lexicon generation for Arabic using a statistical approach to grapheme-to-phoneme conversion and pronunciation statistics. Furthermore, a novel ASR-based audio segmentation algorithm is presented. Finally, we discuss practical approaches for parallelized acoustic training and memory efficient lattice rescoring. Systematic results are reported on recent GALE evaluation corpora.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115684897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Speech recognition with localized time-frequency pattern detectors 局域时频模式检测器的语音识别
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430135
K. Schutte, James R. Glass
A method for acoustic modeling of speech is presented which is based on learning and detecting the occurrence of localized time-frequency patterns in a spectrogram. A boosting algorithm is applied to both build classifiers and perform feature selection from a large set of features derived by filtering spectrograms. Initial experiments are performed to discriminate digits in the Aurora database. The system succeeds in learning sequences of localized time-frequency patterns which are highly interpretable from an acoustic-phonetic viewpoint. While the work and the results are preliminary, they suggest that pursuing these techniques further could lead to new approaches to acoustic modeling for ASR which are more noise robust and offer better encoding of temporal dynamics than typical features such as frame-based cepstra.
提出了一种基于学习和检测频谱图中局部时频模式的语音声学建模方法。利用增强算法建立分类器,并从滤波谱图得到的大量特征中进行特征选择。最初的实验是为了区分Aurora数据库中的数字。该系统成功地学习了局部时频模式序列,从声学-语音的角度来看,这些序列具有很高的可解释性。虽然这项工作和结果是初步的,但他们认为,进一步研究这些技术可能会导致ASR声学建模的新方法,这些方法比基于帧的倒频谱等典型特征更具噪声鲁棒性,并提供更好的时间动态编码。
{"title":"Speech recognition with localized time-frequency pattern detectors","authors":"K. Schutte, James R. Glass","doi":"10.1109/ASRU.2007.4430135","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430135","url":null,"abstract":"A method for acoustic modeling of speech is presented which is based on learning and detecting the occurrence of localized time-frequency patterns in a spectrogram. A boosting algorithm is applied to both build classifiers and perform feature selection from a large set of features derived by filtering spectrograms. Initial experiments are performed to discriminate digits in the Aurora database. The system succeeds in learning sequences of localized time-frequency patterns which are highly interpretable from an acoustic-phonetic viewpoint. While the work and the results are preliminary, they suggest that pursuing these techniques further could lead to new approaches to acoustic modeling for ASR which are more noise robust and offer better encoding of temporal dynamics than typical features such as frame-based cepstra.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114384991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Dealing with cross-lingual aspects in spoken name recognition 处理口语人名识别中的跨语言方面
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430149
F. Stouten, J. Martens
The development of an automatic speech recognizer (ASR) that can accurately recognize spoken names belonging to a large lexicon, is still a big challenge. One of the bottlenecks is that many names contain elements of a foreign language origin, and native speakers can adopt very different pronunciations of these elements, ranging from completely nativized to completely foreignized pronunciations. In this paper we further develop a recently proposed method for improving the recognition of foreign proper names spoken by native speakers. The main idea is to combine the standard acoustic model scores with scores emerging from a phonologically inspired back-off model that was trained on native speech only. This means that the proposed method does not require the development of any foreign phoneme models on foreign speech data. By applying our method on a baseline Dutch recognizer (comprising Dutch acoustic models) we could reduce the name error rate for French and English names by a considerable amount.
自动语音识别器(ASR)的发展,可以准确地识别口语名称属于一个庞大的词典,仍然是一个巨大的挑战。其中一个瓶颈是,许多名字包含来自外语的元素,母语人士可以对这些元素采用非常不同的发音,从完全本土化到完全异化的发音。在本文中,我们进一步发展了最近提出的一种方法,以提高对母语人士所说的外国专有名称的识别。主要想法是将标准声学模型得分与仅在母语语音上训练的音系启发的后退模型得分结合起来。这意味着所提出的方法不需要在外来语数据上开发任何外来语位模型。通过将我们的方法应用于基线荷兰语识别器(包括荷兰语声学模型),我们可以大大降低法语和英语名称的错误率。
{"title":"Dealing with cross-lingual aspects in spoken name recognition","authors":"F. Stouten, J. Martens","doi":"10.1109/ASRU.2007.4430149","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430149","url":null,"abstract":"The development of an automatic speech recognizer (ASR) that can accurately recognize spoken names belonging to a large lexicon, is still a big challenge. One of the bottlenecks is that many names contain elements of a foreign language origin, and native speakers can adopt very different pronunciations of these elements, ranging from completely nativized to completely foreignized pronunciations. In this paper we further develop a recently proposed method for improving the recognition of foreign proper names spoken by native speakers. The main idea is to combine the standard acoustic model scores with scores emerging from a phonologically inspired back-off model that was trained on native speech only. This means that the proposed method does not require the development of any foreign phoneme models on foreign speech data. By applying our method on a baseline Dutch recognizer (comprising Dutch acoustic models) we could reduce the name error rate for French and English names by a considerable amount.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123965974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The IBM 2007 speech transcription system for European parliamentary speeches 欧洲议会演讲的IBM 2007语音转录系统
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430158
B. Ramabhadran, O. Siohan, A. Sethy
TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM's English speech recognition system submitted to the TC-STAR 2007 Evaluation. Language model adaptation based on clustering and data selection using relative entropy minimization provided significant gains in the 2007 evaluation. The additional advances over the 2006 system that we present in this paper include unsupervised training of acoustic and language models; a system architecture that is based on cross-adaptation across complementary systems and system combination through generation of an ensemble of systems using randomized decision tree state-tying. These advances reduced the error rate by 30% relative over the best-performing system in the TC-STAR 2006 evaluation on the 2006 English development and evaluation test sets, and produced one of the best performing systems on the 2007 evaluation in English with a word error rate of 7.1%.
TC-STAR是欧盟资助的演讲到演讲翻译项目,用于转录、翻译和合成欧洲议会全体会议演讲(EPPS)。本文介绍了IBM公司提交TC-STAR 2007评估的英语语音识别系统。基于聚类的语言模型自适应和使用相对熵最小化的数据选择在2007年的评估中取得了显著的进展。我们在本文中介绍的2006年系统的其他进步包括声学和语言模型的无监督训练;一种系统架构,它基于互补系统之间的交叉适应和系统组合,通过使用随机决策树状态绑定生成系统集合。这些进步将错误率相对于2006年英语发展和评估测试集的TC-STAR 2006评估中表现最好的系统降低了30%,并产生了2007年英语评估中表现最好的系统之一,单词错误率为7.1%。
{"title":"The IBM 2007 speech transcription system for European parliamentary speeches","authors":"B. Ramabhadran, O. Siohan, A. Sethy","doi":"10.1109/ASRU.2007.4430158","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430158","url":null,"abstract":"TC-STAR is an European Union funded speech to speech translation project to transcribe, translate and synthesize European Parliamentary Plenary Speeches (EPPS). This paper describes IBM's English speech recognition system submitted to the TC-STAR 2007 Evaluation. Language model adaptation based on clustering and data selection using relative entropy minimization provided significant gains in the 2007 evaluation. The additional advances over the 2006 system that we present in this paper include unsupervised training of acoustic and language models; a system architecture that is based on cross-adaptation across complementary systems and system combination through generation of an ensemble of systems using randomized decision tree state-tying. These advances reduced the error rate by 30% relative over the best-performing system in the TC-STAR 2006 evaluation on the 2006 English development and evaluation test sets, and produced one of the best performing systems on the 2007 evaluation in English with a word error rate of 7.1%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122719420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
期刊
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1