Pipelining Acoustic Model Training for Speech Recognition Using Storm

2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation Pub Date : 2013-09-24 DOI:10.1109/CIMSIM.2013.42

D. Sitaram, Haripriya Srinivasaraghavan, Kapish Agarwal, Amritanshu Agrawal, N. Joshi, Debraj Ray

{"title":"Pipelining Acoustic Model Training for Speech Recognition Using Storm","authors":"D. Sitaram, Haripriya Srinivasaraghavan, Kapish Agarwal, Amritanshu Agrawal, N. Joshi, Debraj Ray","doi":"10.1109/CIMSIM.2013.42","DOIUrl":null,"url":null,"abstract":"Speech recognition has been increasingly used on mobile devices, which has in turn increased the need for creation of new acoustic models for various languages, dialects, accents, speakers and environmental conditions. This involves training and adapting a huge number of acoustic models, some of them in real-time. Training acoustic models is thus essential for speech recognition because these models determine the accuracy and quality of the recognition process. This paper, discusses the use of Storm, a distributed real time computational system, to pipeline the creation of acoustic models by CMU Sphinx, an open-source software project for speech recognition and training. Software pipelining reduces the time required for training and optimizes system resource utilization, thus enabling huge amounts of data to be trained in considerably less amount of time than taken by the conventional sequential process. Pipelining is achieved by grouping the stages of the training process into a set of five stages, and running each stage on individual nodes in a Storm cluster. Thus acoustic models are created by training multiple streams of speech samples using the same SphinxTrain setup, also resulting in improvement of training time and throughput.","PeriodicalId":249355,"journal":{"name":"2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIMSIM.2013.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Speech recognition has been increasingly used on mobile devices, which has in turn increased the need for creation of new acoustic models for various languages, dialects, accents, speakers and environmental conditions. This involves training and adapting a huge number of acoustic models, some of them in real-time. Training acoustic models is thus essential for speech recognition because these models determine the accuracy and quality of the recognition process. This paper, discusses the use of Storm, a distributed real time computational system, to pipeline the creation of acoustic models by CMU Sphinx, an open-source software project for speech recognition and training. Software pipelining reduces the time required for training and optimizes system resource utilization, thus enabling huge amounts of data to be trained in considerably less amount of time than taken by the conventional sequential process. Pipelining is achieved by grouping the stages of the training process into a set of five stages, and running each stage on individual nodes in a Storm cluster. Thus acoustic models are created by training multiple streams of speech samples using the same SphinxTrain setup, also resulting in improvement of training time and throughput.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于Storm的语音识别管道声学模型训练

语音识别在移动设备上的应用越来越多，这反过来又增加了为各种语言、方言、口音、说话者和环境条件创建新的声学模型的需求。这包括训练和调整大量的声学模型，其中一些是实时的。因此，训练声学模型对于语音识别至关重要，因为这些模型决定了识别过程的准确性和质量。本文讨论了CMU Sphinx(一个用于语音识别和训练的开源软件项目)使用分布式实时计算系统Storm来流水线创建声学模型。软件流水线减少了训练所需的时间，并优化了系统资源的利用，因此，与传统的顺序过程相比，可以在相当短的时间内训练大量的数据。流水线是通过将训练过程的阶段分组为一组五个阶段，并在Storm集群中的单个节点上运行每个阶段来实现的。因此，声学模型是通过使用相同的SphinxTrain设置训练多个语音样本流来创建的，这也导致了训练时间和吞吐量的改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation

自引率

0.00%

发文量