首页 > 最新文献

2013 IEEE Workshop on Automatic Speech Recognition and Understanding最新文献

英文 中文
A generalized discriminative training framework for system combination 系统组合的广义判别训练框架
Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707703
Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, J. Hershey
This paper proposes a generalized discriminative training framework for system combination, which encompasses acoustic modeling (Gaussian mixture models and deep neural networks) and discriminative feature transformation. To improve the performance by combining base systems with complementary systems, complementary systems should have reasonably good performance while tending to have different outputs compared with the base system. Although it is difficult to balance these two somewhat opposite targets in conventional heuristic combination approaches, our framework provides a new objective function that enables to adjust the balance within a sequential discriminative training criterion. We also describe how the proposed method relates to boosting methods. Experiments on highly noisy middle vocabulary speech recognition task (2nd CHiME challenge track 2) and LVCSR task (Corpus of Spontaneous Japanese) show the effectiveness of the proposed method, compared with a conventional system combination approach.
本文提出了一种用于系统组合的广义判别训练框架,该框架包括声学建模(高斯混合模型和深度神经网络)和判别特征变换。为了通过基础系统与互补系统的结合来提高性能,互补系统应具有合理的良好性能,但与基础系统相比,互补系统的输出往往不同。尽管在传统的启发式组合方法中很难平衡这两个有点相反的目标,但我们的框架提供了一个新的目标函数,可以在顺序判别性训练标准中调整平衡。我们还描述了所提出的方法与增强方法的关系。在高噪声中词汇语音识别任务(第2次CHiME挑战轨道2)和LVCSR任务(自发日语语料库)上的实验表明,与传统的系统组合方法相比,该方法是有效的。
{"title":"A generalized discriminative training framework for system combination","authors":"Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, J. Hershey","doi":"10.1109/ASRU.2013.6707703","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707703","url":null,"abstract":"This paper proposes a generalized discriminative training framework for system combination, which encompasses acoustic modeling (Gaussian mixture models and deep neural networks) and discriminative feature transformation. To improve the performance by combining base systems with complementary systems, complementary systems should have reasonably good performance while tending to have different outputs compared with the base system. Although it is difficult to balance these two somewhat opposite targets in conventional heuristic combination approaches, our framework provides a new objective function that enables to adjust the balance within a sequential discriminative training criterion. We also describe how the proposed method relates to boosting methods. Experiments on highly noisy middle vocabulary speech recognition task (2nd CHiME challenge track 2) and LVCSR task (Corpus of Spontaneous Japanese) show the effectiveness of the proposed method, compared with a conventional system combination approach.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121489053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Expert-based reward shaping and exploration scheme for boosting policy learning of dialogue management 基于专家的奖励塑造及促进对话管理政策学习的探索方案
Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707714
Emmanuel Ferreira, F. Lefèvre
This paper investigates the conditions under which expert knowledge can be used to accelerate the policy optimization of a learning agent. Recent works on reinforcement learning for dialogue management allowed to devise sophisticated methods for value estimation in order to deal all together with exploration/exploitation dilemma, sample-efficiency and non-stationary environments. In this paper, a reward shaping method and an exploration scheme, both based on some intuitive hand-coded expert advices, are combined with an efficient temporal difference-based learning procedure. The key objective is to boost the initial training stage, when the system is not sufficiently reliable to interact with real users (e.g. clients). Our claims are illustrated by experiments based on simulation and carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS).
研究了利用专家知识加速学习智能体策略优化的条件。最近关于对话管理的强化学习的工作允许设计复杂的价值估计方法,以处理所有的探索/开发困境,样本效率和非平稳环境。本文将基于直观的手工编码专家建议的奖励塑造方法和探索方案与有效的基于时间差异的学习过程相结合。关键目标是提高初始训练阶段,当系统不够可靠,无法与真实用户(例如客户端)交互时。我们的主张是通过基于模拟的实验来说明的,并使用了最先进的面向目标的对话管理框架——隐藏信息状态(HIS)。
{"title":"Expert-based reward shaping and exploration scheme for boosting policy learning of dialogue management","authors":"Emmanuel Ferreira, F. Lefèvre","doi":"10.1109/ASRU.2013.6707714","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707714","url":null,"abstract":"This paper investigates the conditions under which expert knowledge can be used to accelerate the policy optimization of a learning agent. Recent works on reinforcement learning for dialogue management allowed to devise sophisticated methods for value estimation in order to deal all together with exploration/exploitation dilemma, sample-efficiency and non-stationary environments. In this paper, a reward shaping method and an exploration scheme, both based on some intuitive hand-coded expert advices, are combined with an efficient temporal difference-based learning procedure. The key objective is to boost the initial training stage, when the system is not sufficiently reliable to interact with real users (e.g. clients). Our claims are illustrated by experiments based on simulation and carried out using a state-of-the-art goal-oriented dialogue management framework, the Hidden Information State (HIS).","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121518288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Probabilistic lexical modeling and unsupervised training for zero-resourced ASR 零资源ASR的概率词汇建模与无监督训练
Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707771
Ramya Rasipuram, Marzieh Razavi, M. Magimai.-Doss
Standard automatic speech recognition (ASR) systems rely on transcribed speech, language models, and pronunciation dictionaries to achieve state-of-the-art performance. The unavailability of these resources constrains the ASR technology to be available for many languages. In this paper, we propose a novel zero-resourced ASR approach to train acoustic models that only uses list of probable words from the language of interest. The proposed approach is based on Kullback-Leibler divergence based hidden Markov model (KL-HMM), grapheme subword units, knowledge of grapheme-to-phoneme mapping, and graphemic constraints derived from the word list. The approach also exploits existing acoustic and lexical resources available in other resource rich languages. Furthermore, we propose unsupervised adaptation of KL-HMM acoustic model parameters if untranscribed speech data in the target language is available. We demonstrate the potential of the proposed approach through a simulated study on Greek language.
标准的自动语音识别(ASR)系统依赖于转录语音、语言模型和发音字典来实现最先进的性能。这些资源的不可用性限制了ASR技术对许多语言的可用性。在本文中,我们提出了一种新的零资源ASR方法来训练声学模型,该模型仅使用感兴趣语言中的可能单词列表。该方法基于基于Kullback-Leibler散度的隐马尔可夫模型(KL-HMM)、字素子词单元、字素到音素映射的知识以及从单词列表中导出的字素约束。该方法还利用了其他资源丰富的语言中现有的声学和词汇资源。此外,如果目标语言中有未转录的语音数据,我们建议对KL-HMM声学模型参数进行无监督自适应。我们通过对希腊语的模拟研究证明了所提出方法的潜力。
{"title":"Probabilistic lexical modeling and unsupervised training for zero-resourced ASR","authors":"Ramya Rasipuram, Marzieh Razavi, M. Magimai.-Doss","doi":"10.1109/ASRU.2013.6707771","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707771","url":null,"abstract":"Standard automatic speech recognition (ASR) systems rely on transcribed speech, language models, and pronunciation dictionaries to achieve state-of-the-art performance. The unavailability of these resources constrains the ASR technology to be available for many languages. In this paper, we propose a novel zero-resourced ASR approach to train acoustic models that only uses list of probable words from the language of interest. The proposed approach is based on Kullback-Leibler divergence based hidden Markov model (KL-HMM), grapheme subword units, knowledge of grapheme-to-phoneme mapping, and graphemic constraints derived from the word list. The approach also exploits existing acoustic and lexical resources available in other resource rich languages. Furthermore, we propose unsupervised adaptation of KL-HMM acoustic model parameters if untranscribed speech data in the target language is available. We demonstrate the potential of the proposed approach through a simulated study on Greek language.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133914607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Semi-supervised bootstrapping approach for neural network feature extractor training 神经网络特征提取器训练的半监督自举方法
Pub Date : 2013-12-01 DOI: 10.1109/ASRU.2013.6707775
F. Grézl, M. Karafiát
This paper presents bootstrapping approach for neural network training. The neural networks serve as bottle-neck feature extractor for subsequent GMM-HMM recognizer. The recognizer is also used for transcription and confidence assignment of untranscribed data. Based on the confidence, segments are selected and mixed with supervised data and new NNs are trained. With this approach, it is possible to recover 40-55% of the difference between partially and fully transcribed data (3 to 5% absolute improvement over NN trained on supervised data only). Using 70-85% of automatically transcribed segments with the highest confidence was found optimal to achieve this result.
提出了一种神经网络训练的自举方法。神经网络作为瓶颈特征提取器用于后续的GMM-HMM识别。识别器还用于未转录数据的转录和置信度分配。基于置信度,选择片段并与监督数据混合,训练新的神经网络。使用这种方法,可以恢复部分转录和完全转录数据之间40-55%的差异(比仅在监督数据上训练的神经网络绝对提高3 - 5%)。使用70-85%具有最高置信度的自动转录片段被认为是实现这一结果的最佳选择。
{"title":"Semi-supervised bootstrapping approach for neural network feature extractor training","authors":"F. Grézl, M. Karafiát","doi":"10.1109/ASRU.2013.6707775","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707775","url":null,"abstract":"This paper presents bootstrapping approach for neural network training. The neural networks serve as bottle-neck feature extractor for subsequent GMM-HMM recognizer. The recognizer is also used for transcription and confidence assignment of untranscribed data. Based on the confidence, segments are selected and mixed with supervised data and new NNs are trained. With this approach, it is possible to recover 40-55% of the difference between partially and fully transcribed data (3 to 5% absolute improvement over NN trained on supervised data only). Using 70-85% of automatically transcribed segments with the highest confidence was found optimal to achieve this result.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133392264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling 隐式预处理和采样加速深度神经网络无hessian优化
Pub Date : 2013-09-05 DOI: 10.1109/ASRU.2013.6707747
Tara N. Sainath, L. Horesh, Brian Kingsbury, A. Aravkin, B. Ramabhadran
Hessian-free training has become a popular parallel second order optimization technique for Deep Neural Network training. This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian. In this paper, we develop an L-BFGS based preconditioning scheme that avoids the need to access the Hessian explicitly. Since L-BFGS cannot be regarded as a fixed-point iteration, we further propose the employment of flexible Krylov subspace solvers that retain the desired theoretical convergence guarantees of their conventional counterparts. Second, we propose a new sampling algorithm, which geometrically increases the amount of data utilized for gradient and Krylov subspace iteration calculations. On a 50-hr English Broadcast News task, we find that these methodologies provide roughly a 1.5× speed-up, whereas, on a 300-hr Switchboard task, these techniques provide over a 2.3× speedup, with no loss in WER. These results suggest that even further speed-up is expected, as problems scale and complexity grows.
无Hessian-free训练已成为一种流行的深度神经网络并行二阶优化训练方法。本研究旨在通过减少用于训练的数据量以及减少用于隐式估计Hessian的Krylov子空间求解器迭代次数来加速无Hessian训练。在本文中,我们开发了一种基于L-BFGS的预处理方案,避免了显式访问Hessian的需要。由于L-BFGS不能被视为不动点迭代,我们进一步提出使用灵活的Krylov子空间解算器,它保留了传统对子空间解算器所需的理论收敛保证。其次,我们提出了一种新的采样算法,该算法以几何方式增加了梯度和Krylov子空间迭代计算所使用的数据量。在50小时的英语广播新闻任务中,我们发现这些方法提供了大约1.5倍的加速,而在300小时的总机任务中,这些技术提供了超过2.3倍的加速,并且没有损失WER。这些结果表明,随着问题规模和复杂性的增长,预期会有进一步的加速。
{"title":"Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling","authors":"Tara N. Sainath, L. Horesh, Brian Kingsbury, A. Aravkin, B. Ramabhadran","doi":"10.1109/ASRU.2013.6707747","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707747","url":null,"abstract":"Hessian-free training has become a popular parallel second order optimization technique for Deep Neural Network training. This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian. In this paper, we develop an L-BFGS based preconditioning scheme that avoids the need to access the Hessian explicitly. Since L-BFGS cannot be regarded as a fixed-point iteration, we further propose the employment of flexible Krylov subspace solvers that retain the desired theoretical convergence guarantees of their conventional counterparts. Second, we propose a new sampling algorithm, which geometrically increases the amount of data utilized for gradient and Krylov subspace iteration calculations. On a 50-hr English Broadcast News task, we find that these methodologies provide roughly a 1.5× speed-up, whereas, on a 300-hr Switchboard task, these techniques provide over a 2.3× speedup, with no loss in WER. These results suggest that even further speed-up is expected, as problems scale and complexity grows.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117244156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Improvements to Deep Convolutional Neural Networks for LVCSR LVCSR中深度卷积神经网络的改进
Pub Date : 2013-09-05 DOI: 10.1109/ASRU.2013.6707749
Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, G. Saon, H. Soltau, T. Beran, A. Aravkin, B. Ramabhadran
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. This has also been confirmed experimentally, with CNNs showing improvements in word error rate (WER) between 4-12% relative compared to DNNs across a variety of LVCSR tasks. In this paper, we describe different methods to further improve CNN performance. First, we conduct a deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features. Second, we apply various pooling strategies that have shown improvements in computer vision to an LVCSR speech task. Third, we introduce a method to effectively incorporate speaker adaptation, namely fMLLR, into log-mel features. Fourth, we introduce an effective strategy to use dropout during Hessian-free sequence training. We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5% relative improvement over our previous best CNN baseline.
深度卷积神经网络(cnn)比深度神经网络(DNN)更强大,因为它们能够更好地减少输入信号的频谱变化。实验也证实了这一点,与dnn相比,cnn在各种LVCSR任务中的单词错误率(WER)在4-12%之间有所改善。在本文中,我们描述了进一步提高CNN性能的不同方法。首先,我们进行了深入的分析,比较有限权值共享和全权值共享与最先进的功能。其次,我们将各种池化策略应用于LVCSR语音任务中,这些策略在计算机视觉上已经有所改善。第三,我们引入了一种有效地将说话人自适应(fMLLR)纳入对数特征的方法。第四,在无hessian序列训练中引入了一种有效的dropout策略。我们发现,通过这些改进,特别是在fMLLR和dropout方面,我们能够在50小时的广播新闻任务中实现额外的2-3%的相对改进。在一个更大的400小时的BN任务中,我们发现比之前最好的CNN基线额外提高了4-5%。
{"title":"Improvements to Deep Convolutional Neural Networks for LVCSR","authors":"Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, G. Saon, H. Soltau, T. Beran, A. Aravkin, B. Ramabhadran","doi":"10.1109/ASRU.2013.6707749","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707749","url":null,"abstract":"Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. This has also been confirmed experimentally, with CNNs showing improvements in word error rate (WER) between 4-12% relative compared to DNNs across a variety of LVCSR tasks. In this paper, we describe different methods to further improve CNN performance. First, we conduct a deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features. Second, we apply various pooling strategies that have shown improvements in computer vision to an LVCSR speech task. Third, we introduce a method to effectively incorporate speaker adaptation, namely fMLLR, into log-mel features. Fourth, we introduce an effective strategy to use dropout during Hessian-free sequence training. We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline. On a larger 400-hour BN task, we find an additional 4-5% relative improvement over our previous best CNN baseline.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127609422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 218
Investigation of multilingual deep neural networks for spoken term detection 多语种深度神经网络口语词汇检测研究
Pub Date : 2013-09-03 DOI: 10.1109/ASRU.2013.6707719
K. Knill, M. Gales, S. Rath, P. Woodland, Chao Zhang, Shi-Xiong Zhang
The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.
开发针对低资源语言的高性能语音处理系统是一个具有挑战性的领域。解决资源缺乏的一种方法是利用来自多种语言的数据。近年来一个流行的方向是在语音到文本(STT)系统中使用多语言数据训练的瓶颈特征或混合系统。本文对这些多语言方法在口语术语检测中的应用进行了研究。实验使用IARPA Babel有限语言包语料库(~10小时/语言),使用4种语言进行初始多语言系统开发和额外的目标语言。通过在Tandem配置中使用多语言瓶颈特性获得的STT收益也适用于关键字搜索(KWS)。通过将语言问题合并到训练集语言的串联GMM-HMM决策树中,观察到STT和KWS的进一步改进。适应混合系统的平均表现略差于适应串联系统。对目标语言进行的独立于语言的声学模型测试表明,为了达到合理的性能,目前最少需要对声学模型进行再训练或适应目标语言。
{"title":"Investigation of multilingual deep neural networks for spoken term detection","authors":"K. Knill, M. Gales, S. Rath, P. Woodland, Chao Zhang, Shi-Xiong Zhang","doi":"10.1109/ASRU.2013.6707719","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707719","url":null,"abstract":"The development of high-performance speech processing systems for low-resource languages is a challenging area. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to use bottleneck features, or hybrid systems, trained on multilingual data for speech-to-text (STT) systems. This paper presents an investigation into the application of these multilingual approaches to spoken term detection. Experiments were run using the IARPA Babel limited language pack corpora (~10 hours/language) with 4 languages for initial multilingual system development and an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the training set languages. Adapted hybrid systems performed slightly worse on average than the adapted Tandem systems. A language independent acoustic model test on the target language showed that retraining or adapting of the acoustic models to the target language is currently minimally needed to achieve reasonable performance.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121261364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
Modified splice and its extension to non-stereo data for noise robust speech recognition 改进的拼接及其扩展到非立体声数据的噪声鲁棒语音识别
Pub Date : 2013-07-15 DOI: 10.1109/ASRU.2013.6707725
D. S. P. Kumar, N. Prasad, Vikas Joshi, S. Umesh
In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs.
本文对流行的SPLICE算法的训练过程进行了改进,用于噪声鲁棒语音识别。这种改进是基于特征相关性的,使这种基于立体的算法能够提高在所有噪声条件下的性能,特别是在看不见的情况下。此外,改进的框架被扩展到非立体数据集,其中需要干净和嘈杂的训练话语,但不需要立体对口。最后,在SPLICE框架下提出了一种基于mllr的高效运行时噪声自适应方法。改进后的SPLICE在Aurora-2数据库的C测试中比SPLICE的绝对性能提高了8.6%,总体性能提高了2.93%。与Aurora-2和Aurora-4基线模型相比,非立体方法的绝对改进率分别为10.37%和6.93%。与SPLICE测试C相比,修改后的框架的运行时适应性提高了9.89%,在hmm上的w.r.t.标准MLLR适应性提高了4.96%。
{"title":"Modified splice and its extension to non-stereo data for noise robust speech recognition","authors":"D. S. P. Kumar, N. Prasad, Vikas Joshi, S. Umesh","doi":"10.1109/ASRU.2013.6707725","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707725","url":null,"abstract":"In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition. The modification is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. Finally, an MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed. The modified SPLICE shows 8.6% absolute improvement over SPLICE in Test C of Aurora-2 database, and 2.93% overall. Non-stereo method shows 10.37% and 6.93% absolute improvements over Aurora-2 and Aurora-4 baseline models respectively. Run-time adaptation shows 9.89% absolute improvement in modified framework as compared to SPLICE for Test C, and 4.96% overall w.r.t. standard MLLR adaptation on HMMs.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115416820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
NMF-based keyword learning from scarce data 基于nmf的稀缺数据关键字学习
Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2013.6707762
B. Ons, J. Gemmeke, H. V. hamme
This research is situated in a project aimed at the development of a vocal user interface (VUI) that learns to understand its users specifically persons with a speech impairment. The vocal interface adapts to the speech of the user by learning the vocabulary from interaction examples. Word learning is implemented through weakly supervised non-negative matrix factorization (NMF). The goal of this study is to investigate how we can improve word learning when the number of interaction examples is low. We demonstrate two approaches to train NMF models on scarce data: 1) training word models using smoothed training data, and 2) training word models that strictly correspond to the grounding information derived from a few interaction examples. We found that both approaches can substantially improve word learning from scarce training data.
本研究位于一个旨在开发语音用户界面(VUI)的项目中,该界面可以学习理解其用户,特别是有语言障碍的人。语音界面通过学习交互示例中的词汇来适应用户的语音。单词学习是通过弱监督非负矩阵分解(NMF)实现的。本研究的目的是探讨如何在交互示例数量较低的情况下提高单词学习。我们展示了在稀缺数据上训练NMF模型的两种方法:1)使用平滑的训练数据训练词模型,以及2)训练严格对应于从几个交互示例中获得的基础信息的词模型。我们发现这两种方法都可以从稀缺的训练数据中大大提高单词学习。
{"title":"NMF-based keyword learning from scarce data","authors":"B. Ons, J. Gemmeke, H. V. hamme","doi":"10.1109/ASRU.2013.6707762","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707762","url":null,"abstract":"This research is situated in a project aimed at the development of a vocal user interface (VUI) that learns to understand its users specifically persons with a speech impairment. The vocal interface adapts to the speech of the user by learning the vocabulary from interaction examples. Word learning is implemented through weakly supervised non-negative matrix factorization (NMF). The goal of this study is to investigate how we can improve word learning when the number of interaction examples is low. We demonstrate two approaches to train NMF models on scarce data: 1) training word models using smoothed training data, and 2) training word models that strictly correspond to the grounding information derived from a few interaction examples. We found that both approaches can substantially improve word learning from scarce training data.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133879205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A study of supervised intrinsic spectral analysis for TIMIT phone classification 监督本征频谱分析在TIMIT手机分类中的应用研究
Pub Date : 1900-01-01 DOI: 10.1109/ASRU.2013.6707739
Reza Sahraeian, Dirk Van Compernolle
Intrinsic Spectral Analysis (ISA) has been formulated within a manifold learning setting allowing natural extensions to out-of-sample data together with feature reduction in a learning framework. In this paper, we propose two approaches to improve the performance of supervised ISA, and then we examine the effect of applying Linear Discriminant technique in the intrinsic subspace compared with the extrinsic one. In the interest of reducing complexity, we propose a preprocessing operation to find a small subset of data points being well representative of the manifold structure; this is accomplished by maximizing the quadratic Renyi entropy. Furthermore, we use class based graphs which not only simplify our problem but also can be helpful in a classification task. Experimental results for phone classification task on TIMIT dataset showed that ISA features improve the performance compared with traditional features, and supervised discriminant techniques outperform in the ISA subspace compared to conventional feature spaces.
本征谱分析(ISA)已在流形学习设置中制定,允许对样本外数据进行自然扩展,并在学习框架中进行特征缩减。在本文中,我们提出了两种方法来提高有监督ISA的性能,然后比较了在本征子空间中应用线性判别技术与在外在子空间中应用线性判别技术的效果。为了降低复杂性,我们提出了一种预处理操作,以找到一个小的数据点子集,很好地代表流形结构;这是通过最大化二次Renyi熵来实现的。此外,我们使用了基于类的图,这不仅简化了我们的问题,而且在分类任务中也很有帮助。在TIMIT数据集上进行手机分类任务的实验结果表明,ISA特征比传统特征的性能有所提高,监督判别技术在ISA子空间上的表现优于传统特征空间。
{"title":"A study of supervised intrinsic spectral analysis for TIMIT phone classification","authors":"Reza Sahraeian, Dirk Van Compernolle","doi":"10.1109/ASRU.2013.6707739","DOIUrl":"https://doi.org/10.1109/ASRU.2013.6707739","url":null,"abstract":"Intrinsic Spectral Analysis (ISA) has been formulated within a manifold learning setting allowing natural extensions to out-of-sample data together with feature reduction in a learning framework. In this paper, we propose two approaches to improve the performance of supervised ISA, and then we examine the effect of applying Linear Discriminant technique in the intrinsic subspace compared with the extrinsic one. In the interest of reducing complexity, we propose a preprocessing operation to find a small subset of data points being well representative of the manifold structure; this is accomplished by maximizing the quadratic Renyi entropy. Furthermore, we use class based graphs which not only simplify our problem but also can be helpful in a classification task. Experimental results for phone classification task on TIMIT dataset showed that ISA features improve the performance compared with traditional features, and supervised discriminant techniques outperform in the ISA subspace compared to conventional feature spaces.","PeriodicalId":265258,"journal":{"name":"2013 IEEE Workshop on Automatic Speech Recognition and Understanding","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131965972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2013 IEEE Workshop on Automatic Speech Recognition and Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1