首页 > 最新文献

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)最新文献

英文 中文
The RWTH Arabic-to-English spoken language translation system RWTH阿拉伯语到英语口语翻译系统
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430145
Oliver Bender, E. Matusov, Stefan Hahn, Sasa Hasan, Shahram Khadivi, H. Ney
We present the RWTH phrase-based statistical machine translation system designed for the translation of Arabic speech into English text. This system was used in the Global Autonomous Language Exploitation (GALE) Go/No-Go Translation Evaluation 2007. Using a two-pass approach, we first generate n-best translation candidates and then rerank these candidates using additional models. We give a short review of the decoder as well as of the models used in both passes. We stress the difficulties of spoken language translation, i.e. how to combine the recognition and translation systems and how to compensate for missing punctuation. In addition, we cover our work on domain adaptation for the applied language models. We present translation results for the official GALE 2006 evaluation set and the GALE 2007 development set.
我们提出了一种基于RWTH短语的统计机器翻译系统,用于将阿拉伯语语音翻译成英语文本。该系统被用于2007年全球自主语言开发(GALE) Go/No-Go翻译评估。使用两步方法,我们首先生成n个最佳候选翻译,然后使用其他模型对这些候选翻译进行重新排序。我们给出了一个简短的回顾解码器以及在两个通道中使用的模型。我们强调口语翻译的难点,即如何将识别系统和翻译系统结合起来,以及如何补偿缺失的标点符号。此外,我们还介绍了应用语言模型的领域适应工作。我们介绍了GALE 2006官方评估集和GALE 2007开发集的翻译结果。
{"title":"The RWTH Arabic-to-English spoken language translation system","authors":"Oliver Bender, E. Matusov, Stefan Hahn, Sasa Hasan, Shahram Khadivi, H. Ney","doi":"10.1109/ASRU.2007.4430145","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430145","url":null,"abstract":"We present the RWTH phrase-based statistical machine translation system designed for the translation of Arabic speech into English text. This system was used in the Global Autonomous Language Exploitation (GALE) Go/No-Go Translation Evaluation 2007. Using a two-pass approach, we first generate n-best translation candidates and then rerank these candidates using additional models. We give a short review of the decoder as well as of the models used in both passes. We stress the difficulties of spoken language translation, i.e. how to combine the recognition and translation systems and how to compensate for missing punctuation. In addition, we cover our work on domain adaptation for the applied language models. We present translation results for the official GALE 2006 evaluation set and the GALE 2007 development set.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126409212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
A data-centric architecture for data-driven spoken dialog systems 用于数据驱动的口语对话系统的以数据为中心的架构
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430168
S. Varges, G. Riccardi
Data is becoming increasingly crucial for training and (self-) evaluation of spoken dialog systems (SDS). Data is used to train models (e.g. acoustic models) and is 'forgotten'. Data is generated on-line from the different components of the SDS system, e.g. the dialog manager, as well as from the world it is interacting with (e.g. news streams, ambient sensors etc.). The data is used to evaluate and analyze conversational systems both on-line and off-line. We need to be able query such heterogeneous data for further processing. In this paper we present an approach with two novel components: first, an architecture for SDSs that takes a data-centric view, ensuring persistency and consistency of data as it is generated. The architecture is centered around a database that stores dialog data beyond the lifetime of individual dialog sessions, facilitating dialog mining, annotation, and logging. Second, we take advantage of the state-fullness of the data-centric architecture by means of a lightweight, reactive and inference-based dialog manager that itself is stateless. The feasibility of our approach has been validated within a prototype of a phone-based university help-desk application. We detail SDS architecture and dialog management, model, and data representation.
数据对于口语对话系统(SDS)的训练和(自我)评估变得越来越重要。数据用于训练模型(例如声学模型),并被“遗忘”。数据由SDS系统的不同组件在线生成,例如对话管理器,以及与之交互的世界(例如新闻流、环境传感器等)。这些数据用于评估和分析在线和离线的会话系统。我们需要能够查询这些异构数据以进行进一步处理。在本文中,我们提出了一种具有两个新颖组件的方法:首先,采用以数据为中心的视图的sds体系结构,确保数据生成时的持久性和一致性。该体系结构以数据库为中心,该数据库存储超出单个对话会话生命周期的对话数据,促进对话挖掘、注释和日志记录。其次,我们利用了以数据为中心的体系结构的状态完备性,方法是使用一个轻量级的、响应式的、基于推理的对话管理器,该对话管理器本身是无状态的。我们的方法的可行性已经在一个基于电话的大学服务台应用程序的原型中得到了验证。我们详细介绍了SDS体系结构和对话管理、模型和数据表示。
{"title":"A data-centric architecture for data-driven spoken dialog systems","authors":"S. Varges, G. Riccardi","doi":"10.1109/ASRU.2007.4430168","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430168","url":null,"abstract":"Data is becoming increasingly crucial for training and (self-) evaluation of spoken dialog systems (SDS). Data is used to train models (e.g. acoustic models) and is 'forgotten'. Data is generated on-line from the different components of the SDS system, e.g. the dialog manager, as well as from the world it is interacting with (e.g. news streams, ambient sensors etc.). The data is used to evaluate and analyze conversational systems both on-line and off-line. We need to be able query such heterogeneous data for further processing. In this paper we present an approach with two novel components: first, an architecture for SDSs that takes a data-centric view, ensuring persistency and consistency of data as it is generated. The architecture is centered around a database that stores dialog data beyond the lifetime of individual dialog sessions, facilitating dialog mining, annotation, and logging. Second, we take advantage of the state-fullness of the data-centric architecture by means of a lightweight, reactive and inference-based dialog manager that itself is stateless. The feasibility of our approach has been validated within a prototype of a phone-based university help-desk application. We detail SDS architecture and dialog management, model, and data representation.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121132213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Improvements in phone based audio search via constrained match with high order confusion estimates 基于高阶混淆估计约束匹配的电话音频搜索改进
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430191
U. Chaudhari, M. Picheny
This paper investigates an approximate similarity measure for searching in phone based audio transcripts. The baseline method combines elements found in the literature to form an approach based on a phonetic confusion matrix that is used to determine the similarity of an audio document and a query, both of which are parsed into phone N-grams. Experimental results show comparable performance to other approaches in the literature. Extensions of the approach are developed based on a constrained form of the similarity measure that can take into consideration the system dependent errors that can occur. This is done by accounting for higher order confusions, namely of phone bi-grams and tri-grams. Results show improved performance across a variety of system configurations.
本文研究了一种基于电话的音频文本搜索的近似相似性度量。基线方法结合了文献中发现的元素,形成了一种基于语音混淆矩阵的方法,该方法用于确定音频文档和查询的相似性,两者都被解析为电话N-grams。实验结果表明,该方法的性能与文献中其他方法相当。该方法的扩展基于一种约束形式的相似性度量,可以考虑可能发生的系统相关误差。这是通过考虑更高阶的混淆来完成的,即电话双格和三格。结果显示,在各种系统配置中性能都有所提高。
{"title":"Improvements in phone based audio search via constrained match with high order confusion estimates","authors":"U. Chaudhari, M. Picheny","doi":"10.1109/ASRU.2007.4430191","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430191","url":null,"abstract":"This paper investigates an approximate similarity measure for searching in phone based audio transcripts. The baseline method combines elements found in the literature to form an approach based on a phonetic confusion matrix that is used to determine the similarity of an audio document and a query, both of which are parsed into phone N-grams. Experimental results show comparable performance to other approaches in the literature. Extensions of the approach are developed based on a constrained form of the similarity measure that can take into consideration the system dependent errors that can occur. This is done by accounting for higher order confusions, namely of phone bi-grams and tri-grams. Results show improved performance across a variety of system configurations.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121197364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Graph-based learning for phonetic classification 基于图的语音分类学习
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430138
Andrei Alexandrescu, K. Kirchhoff
We introduce graph-based learning for acoustic-phonetic classification. In graph-based learning, training and test data points are jointly represented in a weighted undirected graph characterized by a weight matrix indicating similarities between different samples. Classification of test samples is achieved by label propagation over the entire graph. Although this learning technique is commonly applied in semi-supervised settings, we show how it can also be used as a postprocessing step to a supervised classifier by imposing additional regularization constraints based on the underlying data manifold. We also present a technique to adapt graph-based learning to large datasets and evaluate our system on a vowel classification task. Our results show that graph-based learning improves significantly over state-of-the art baselines.
我们引入了基于图的语音分类学习。在基于图的学习中,训练数据点和测试数据点共同表示在一个加权无向图中,该图用一个表示不同样本之间相似性的权重矩阵来表征。测试样本的分类是通过整个图上的标签传播来实现的。尽管这种学习技术通常应用于半监督设置,但我们展示了如何通过基于底层数据流形施加额外的正则化约束,将其用作监督分类器的后处理步骤。我们还提出了一种技术,使基于图的学习适应于大型数据集,并在元音分类任务上评估我们的系统。我们的结果表明,基于图的学习比最先进的基线有了显著的提高。
{"title":"Graph-based learning for phonetic classification","authors":"Andrei Alexandrescu, K. Kirchhoff","doi":"10.1109/ASRU.2007.4430138","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430138","url":null,"abstract":"We introduce graph-based learning for acoustic-phonetic classification. In graph-based learning, training and test data points are jointly represented in a weighted undirected graph characterized by a weight matrix indicating similarities between different samples. Classification of test samples is achieved by label propagation over the entire graph. Although this learning technique is commonly applied in semi-supervised settings, we show how it can also be used as a postprocessing step to a supervised classifier by imposing additional regularization constraints based on the underlying data manifold. We also present a technique to adapt graph-based learning to large datasets and evaluate our system on a vowel classification task. Our results show that graph-based learning improves significantly over state-of-the art baselines.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115265774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Data selection for speech recognition 语音识别的数据选择
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430173
Yi Wu, Rong Zhang, Alexander I. Rudnicky
This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that "there is no data like more data", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.
本文提出了一种从大型语音转录语料库中高效选择信息数据的策略。我们建议根据目标语音单位(音素、词、字等)的分布统一选择数据。在我们的实验中,与“没有数据像更多的数据”的普遍信念相反,我们发现有可能选择一个高信息量的数据子集,产生与使用大量数据的系统相当的识别性能。同时,我们的选择过程是高效和快速的。
{"title":"Data selection for speech recognition","authors":"Yi Wu, Rong Zhang, Alexander I. Rudnicky","doi":"10.1109/ASRU.2007.4430173","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430173","url":null,"abstract":"This paper presents a strategy for efficiently selecting informative data from large corpora of transcribed speech. We propose to choose data uniformly according to the distribution of some target speech unit (phoneme, word, character, etc). In our experiment, in contrast to the common belief that \"there is no data like more data\", we found it possible to select a highly informative subset of data that produces recognition performance comparable to a system that makes use of a much larger amount of data. At the same time, our selection process is efficient and fast.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122551977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
Robust speaker clustering strategies to data source variation for improved speaker diarization 基于数据源变化的稳健说话人聚类策略改进说话人划分
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430121
Kyu Jeong Han, Samuel Kim, Shrikanth S. Narayanan
Agglomerative hierarchical clustering (AHC) has been widely used in speaker diarization systems to classify speech segments in a given data source by speaker identity, but is known to be not robust to data source variation. In this paper, we identify one of the key potential sources of this variability that negatively affects clustering error rate (CER), namely short speech segments, and propose three solutions to tackle this issue. Through experiments on various meeting conversation excerpts, the proposed methods are shown to outperform simple AHC in terms of relative CER improvements in the range of 17-32%.
聚类分层聚类(AHC)已广泛应用于说话人分类系统中,根据说话人身份对给定数据源中的语音片段进行分类,但已知其对数据源变化的鲁棒性较差。在本文中,我们确定了对聚类错误率(CER)产生负面影响的这种可变性的一个关键潜在来源,即短语音片段,并提出了三个解决方案来解决这个问题。通过对各种会议对话摘录的实验,表明所提出的方法在相对CER改进方面优于简单AHC,改善幅度在17-32%之间。
{"title":"Robust speaker clustering strategies to data source variation for improved speaker diarization","authors":"Kyu Jeong Han, Samuel Kim, Shrikanth S. Narayanan","doi":"10.1109/ASRU.2007.4430121","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430121","url":null,"abstract":"Agglomerative hierarchical clustering (AHC) has been widely used in speaker diarization systems to classify speech segments in a given data source by speaker identity, but is known to be not robust to data source variation. In this paper, we identify one of the key potential sources of this variability that negatively affects clustering error rate (CER), namely short speech segments, and propose three solutions to tackle this issue. Through experiments on various meeting conversation excerpts, the proposed methods are shown to outperform simple AHC in terms of relative CER improvements in the range of 17-32%.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122630267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
OOV detection by joint word/phone lattice alignment 联合字/手机点阵对齐的OOV检测
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430159
Hui-Ching Lin, J. Bilmes, D. Vergyri, K. Kirchhoff
We propose a new method for detecting out-of-vocabulary (OOV) words for large vocabulary continuous speech recognition (LVCSR) systems. Our method is based on performing a joint alignment between independently generated word and phone lattices, where the word-lattice is aligned via a recognition lexicon. Based on a similarity measure between phones, we can locate highly mis-aligned regions of time, and then specify those regions as candidate OOVs. This novel approach is implemented using the framework of graphical models (GMs), which enable fast flexible integration of different scores from word lattices, phone lattices, and the similarity measures. We evaluate our method on switchboard data using RT-04 as test set. Experimental results show that our approach provides a promising and scalable new way to detect OOV for LVCSR.
针对大词汇量连续语音识别(LVCSR)系统,提出了一种检测词汇外(OOV)词的新方法。我们的方法是基于在独立生成的词格和电话格之间执行联合对齐,其中词格通过识别词典进行对齐。基于手机之间的相似性度量,我们可以定位高度不对齐的时间区域,然后将这些区域指定为候选oov。这种新颖的方法是使用图形模型(GMs)框架实现的,它可以快速灵活地集成来自词格、电话格和相似度度量的不同分数。我们使用RT-04作为测试集在交换机数据上评估我们的方法。实验结果表明,该方法为LVCSR的OOV检测提供了一种有前途的、可扩展的新方法。
{"title":"OOV detection by joint word/phone lattice alignment","authors":"Hui-Ching Lin, J. Bilmes, D. Vergyri, K. Kirchhoff","doi":"10.1109/ASRU.2007.4430159","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430159","url":null,"abstract":"We propose a new method for detecting out-of-vocabulary (OOV) words for large vocabulary continuous speech recognition (LVCSR) systems. Our method is based on performing a joint alignment between independently generated word and phone lattices, where the word-lattice is aligned via a recognition lexicon. Based on a similarity measure between phones, we can locate highly mis-aligned regions of time, and then specify those regions as candidate OOVs. This novel approach is implemented using the framework of graphical models (GMs), which enable fast flexible integration of different scores from word lattices, phone lattices, and the similarity measures. We evaluate our method on switchboard data using RT-04 as test set. Experimental results show that our approach provides a promising and scalable new way to detect OOV for LVCSR.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122846032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Joint decoding of multiple speech patterns for robust speech recognition 多语音模式联合解码的鲁棒语音识别
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430090
N.U. Nair, T. Sreenivas
We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single hidden Markov model. It is shown that such a solution is possible by aligning the K patterns using the proposed multi pattern dynamic time warping algorithm followed by the constrained multi pattern Viterbi algorithm. The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB signal to noise ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.
我们正在解决一个提高自动语音识别性能的新问题,给定来自同一类模式的多个话语。我们给出了给定单个隐马尔可夫模型的联合解码K多个模式的问题。通过使用所提出的多模式动态时间规整算法和约束多模式Viterbi算法对K模式进行对齐,证明了这种解决方案是可能的。在独立于说话人的孤立词识别环境下,测试了新公式对干净模式和有噪声模式的识别。当10%的语音受到-5 dB信噪比(本地)的突发噪声的影响时,与使用Viterbi算法的单模式解码相比,仅使用两种噪声模式的联合解码将噪声语音识别错误率降低到约51%。相比之下,单个模式可能性的简单最大化只提供了大约7%的错误率降低。
{"title":"Joint decoding of multiple speech patterns for robust speech recognition","authors":"N.U. Nair, T. Sreenivas","doi":"10.1109/ASRU.2007.4430090","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430090","url":null,"abstract":"We are addressing a new problem of improving automatic speech recognition performance, given multiple utterances of patterns from the same class. We have formulated the problem of jointly decoding K multiple patterns given a single hidden Markov model. It is shown that such a solution is possible by aligning the K patterns using the proposed multi pattern dynamic time warping algorithm followed by the constrained multi pattern Viterbi algorithm. The new formulation is tested in the context of speaker independent isolated word recognition for both clean and noisy patterns. When 10 percent of speech is affected by a burst noise at -5 dB signal to noise ratio (local), it is shown that joint decoding using only two noisy patterns reduces the noisy speech recognition error rate to about 51 percent, when compared to the single pattern decoding using the Viterbi Algorithm. In contrast a simple maximization of individual pattern likelihoods, provides only about 7 percent reduction in error rate.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130104773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations 使用扩展Baum-Welch变换的隐马尔可夫模型框架中的广义语音类识别
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430129
Tara N. Sainath, D. Kanevsky, B. Ramabhadran
In many pattern recognition tasks, given some input data and a model, a probabilistic likelihood score is often computed to measure how well the model describes the data. Extended Baum-Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures, though recently they have been used to derive a gradient steepness measurement to evaluate the quality of the model to match the distribution of the data. In this paper, we explore applying the EBW gradient steepness metric in the context of Hidden Markov Models (HMMs) for recognition of broad phonetic classes and present a detailed analysis and results on the use of this gradient metric on the TIMIT corpus. We find that our gradient metric is able to outperform the baseline likelihood method, and offers improvements in noisy conditions.
在许多模式识别任务中,给定一些输入数据和一个模型,通常计算一个概率似然评分来衡量模型描述数据的程度。扩展Baum-Welch (EBW)变换是最常用的判别技术,用于估计高斯混合物的参数,尽管最近它们已被用于导出梯度陡峭度测量,以评估模型的质量,以匹配数据的分布。在本文中,我们探索了在隐马尔可夫模型(hmm)的背景下应用EBW梯度陡峭度度量来识别广泛的语音类别,并给出了在TIMIT语料库上使用该梯度度量的详细分析和结果。我们发现我们的梯度度量能够优于基线似然方法,并且在嘈杂条件下提供改进。
{"title":"Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations","authors":"Tara N. Sainath, D. Kanevsky, B. Ramabhadran","doi":"10.1109/ASRU.2007.4430129","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430129","url":null,"abstract":"In many pattern recognition tasks, given some input data and a model, a probabilistic likelihood score is often computed to measure how well the model describes the data. Extended Baum-Welch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures, though recently they have been used to derive a gradient steepness measurement to evaluate the quality of the model to match the distribution of the data. In this paper, we explore applying the EBW gradient steepness metric in the context of Hidden Markov Models (HMMs) for recognition of broad phonetic classes and present a detailed analysis and results on the use of this gradient metric on the TIMIT corpus. We find that our gradient metric is able to outperform the baseline likelihood method, and offers improvements in noisy conditions.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Empirical study of neural network language models for Arabic speech recognition 神经网络语言模型在阿拉伯语语音识别中的实证研究
Pub Date : 2007-12-01 DOI: 10.1109/ASRU.2007.4430100
Ahmad Emami, L. Mangu
In this paper we investigate the use of neural network language models for Arabic speech recognition. By using a distributed representation of words, the neural network model allows for more robust generalization and is better able to fight the data sparseness problem. We investigate different configurations of the neural probabilistic model, experimenting with such parameters as N-gram order, output vocabulary, normalization method, and model size and parameters. Experiments were carried out on Arabic broadcast news and broadcast conversations data and the optimized neural network language models showed significant improvements over the baseline N-gram model.
本文研究了神经网络语言模型在阿拉伯语语音识别中的应用。通过使用词的分布式表示,神经网络模型允许更健壮的泛化,并且能够更好地解决数据稀疏问题。我们研究了神经概率模型的不同配置,实验了N-gram顺序、输出词汇、归一化方法、模型大小和参数等参数。在阿拉伯语广播新闻和广播对话数据上进行了实验,优化后的神经网络语言模型比基线N-gram模型有了显著的改进。
{"title":"Empirical study of neural network language models for Arabic speech recognition","authors":"Ahmad Emami, L. Mangu","doi":"10.1109/ASRU.2007.4430100","DOIUrl":"https://doi.org/10.1109/ASRU.2007.4430100","url":null,"abstract":"In this paper we investigate the use of neural network language models for Arabic speech recognition. By using a distributed representation of words, the neural network model allows for more robust generalization and is better able to fight the data sparseness problem. We investigate different configurations of the neural probabilistic model, experimenting with such parameters as N-gram order, output vocabulary, normalization method, and model size and parameters. Experiments were carried out on Arabic broadcast news and broadcast conversations data and the optimized neural network language models showed significant improvements over the baseline N-gram model.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128802748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
期刊
2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1