首页 > 最新文献

Workshop on Spoken Language Technologies for Under-resourced Languages最新文献

英文 中文
Advances in Low Resource ASR: A Deep Learning Perspective 基于深度学习的低资源ASR研究进展
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-4
Hardik B. Sailor, Ankur T. Patil, H. Patil
Recently, developing Automatic Speech Recognition (ASR) systems for Low Resource (LR) languages is an active research area. The research in ASR is significantly advanced using deep learning approaches producing state-of-the-art results compared to the conventional approaches. However, it is still challenging to use such approaches for LR languages since it requires a huge amount of training data. Recently, data augmentation, multilingual and cross-lingual approaches, transfer learning, etc. enable training deep learning architectures. This paper presents an overview of deep learning-based approaches for building ASR for LR languages. Recent projects and events organized to support the development of ASR and related applications in this direction are also discussed. This paper could be a good motivation for the researchers interested to work towards low resource ASR using deep learning techniques. The approaches described here could be useful in other related applications, such as audio search.
近年来,针对低资源语言的自动语音识别(ASR)系统的开发是一个活跃的研究领域。与传统方法相比,深度学习方法在ASR研究方面取得了显著进展,产生了最先进的结果。然而,在LR语言中使用这种方法仍然具有挑战性,因为它需要大量的训练数据。最近,数据增强、多语言和跨语言方法、迁移学习等使深度学习架构的训练成为可能。本文概述了基于深度学习的LR语言ASR构建方法。最近组织的项目和活动,以支持ASR的发展和相关的应用在这个方向也进行了讨论。对于有兴趣使用深度学习技术研究低资源ASR的研究人员来说,这篇论文可能是一个很好的动力。这里描述的方法在其他相关应用程序中也很有用,比如音频搜索。
{"title":"Advances in Low Resource ASR: A Deep Learning Perspective","authors":"Hardik B. Sailor, Ankur T. Patil, H. Patil","doi":"10.21437/SLTU.2018-4","DOIUrl":"https://doi.org/10.21437/SLTU.2018-4","url":null,"abstract":"Recently, developing Automatic Speech Recognition (ASR) systems for Low Resource (LR) languages is an active research area. The research in ASR is significantly advanced using deep learning approaches producing state-of-the-art results compared to the conventional approaches. However, it is still challenging to use such approaches for LR languages since it requires a huge amount of training data. Recently, data augmentation, multilingual and cross-lingual approaches, transfer learning, etc. enable training deep learning architectures. This paper presents an overview of deep learning-based approaches for building ASR for LR languages. Recent projects and events organized to support the development of ASR and related applications in this direction are also discussed. This paper could be a good motivation for the researchers interested to work towards low resource ASR using deep learning techniques. The approaches described here could be useful in other related applications, such as audio search.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129740864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Mining Training Data for Language Modeling Across the World's Languages 跨世界语言的语言建模训练数据挖掘
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-13
Manasa Prasad, Theresa Breiner, D. Esch
{"title":"Mining Training Data for Language Modeling Across the World's Languages","authors":"Manasa Prasad, Theresa Breiner, D. Esch","doi":"10.21437/SLTU.2018-13","DOIUrl":"https://doi.org/10.21437/SLTU.2018-13","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"28 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133487657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load 基于功能负载的零资源环境下DPGMM聚类优化
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-1
Bin Wu, S. Sakti, Jinsong Zhang, Satoshi Nakamura
Inspired by infant language acquisition, unsupervised subword discovery of zero-resource languages has gained attention recently. The Dirichlet Process Gaussian Mixture Model (DPGMM) achieves top results evaluated by the ABX discrimination test. However, the DPGMM model is too sensitive to acoustic variation and often produces too many types of subword units and a relatively high-dimensional posteriorgram, which implies high computational cost to perform learning and inference, as well as more tendency to be overfitting. This paper proposes applying functional load to reduce the number of sub-word units from DPGMM. We greedily merge pairs of units with the lowest functional load, causing the least information loss of the language. Results on the Xitsonga corpus with the official setting of Zerospeech 2015 show that we can reduce the number of sub-word units by more than two thirds without hurting the ABX error rate. The number of units is close to that of phonemes in human language.
受幼儿语言习得的启发,零资源语言的无监督子词发现近年来引起了人们的关注。Dirichlet过程高斯混合模型(DPGMM)在ABX判别检验中获得了最高的评价结果。然而,DPGMM模型对声学变化过于敏感,经常产生太多类型的子词单元和相对高维的后图,这意味着执行学习和推理的计算成本高,并且更容易过度拟合。本文提出利用功能负载来减少DPGMM的子词单元数。我们贪婪地合并功能负荷最低的单元对,导致语言的信息损失最少。在官方设置为Zerospeech 2015的西松加语料库上的结果表明,我们可以在不影响ABX错误率的情况下将子词单位数量减少三分之二以上。其单位数量与人类语言中音素的数量相近。
{"title":"Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load","authors":"Bin Wu, S. Sakti, Jinsong Zhang, Satoshi Nakamura","doi":"10.21437/SLTU.2018-1","DOIUrl":"https://doi.org/10.21437/SLTU.2018-1","url":null,"abstract":"Inspired by infant language acquisition, unsupervised subword discovery of zero-resource languages has gained attention recently. The Dirichlet Process Gaussian Mixture Model (DPGMM) achieves top results evaluated by the ABX discrimination test. However, the DPGMM model is too sensitive to acoustic variation and often produces too many types of subword units and a relatively high-dimensional posteriorgram, which implies high computational cost to perform learning and inference, as well as more tendency to be overfitting. This paper proposes applying functional load to reduce the number of sub-word units from DPGMM. We greedily merge pairs of units with the lowest functional load, causing the least information loss of the language. Results on the Xitsonga corpus with the official setting of Zerospeech 2015 show that we can reduce the number of sub-word units by more than two thirds without hurting the ABX error rate. The number of units is close to that of phonemes in human language.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115221893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Development of Assamese Continuous Speech Recognition System 阿萨姆语连续语音识别系统的开发
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-46
Tanmay Bhowmik, S. Mandal
{"title":"Development of Assamese Continuous Speech Recognition System","authors":"Tanmay Bhowmik, S. Mandal","doi":"10.21437/SLTU.2018-46","DOIUrl":"https://doi.org/10.21437/SLTU.2018-46","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"569 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123322906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Comparison of Features for Text-Independent Bengali Speaker Recognition 不依赖文本的孟加拉语说话人识别特征分析与比较
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-57
S. Das, P. Das
{"title":"Analysis and Comparison of Features for Text-Independent Bengali Speaker Recognition","authors":"S. Das, P. Das","doi":"10.21437/SLTU.2018-57","DOIUrl":"https://doi.org/10.21437/SLTU.2018-57","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128985535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Language Identification Using Stacked SDC Features and Residual Neural Network 基于堆叠SDC特征和残差神经网络的改进语言识别
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-44
R. Vuddagiri, Hari Krishna Vydana, A. Vuppala
{"title":"Improved Language Identification Using Stacked SDC Features and Residual Neural Network","authors":"R. Vuddagiri, Hari Krishna Vydana, A. Vuppala","doi":"10.21437/SLTU.2018-44","DOIUrl":"https://doi.org/10.21437/SLTU.2018-44","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125326657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages 信号处理线索改善低资源印度语言的自动语音识别
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-6
Arun Baby, S. KarthikPandiaD., H. Murthy
Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional flat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.
为低资源语言建立准确的声学模型是本文研究的重点。如果电话边界被准确地确定,声学模型可能是准确的。传统的基于平面启动的Viterbi电话对齐(其中只有话语级别的转录可用)导致电话边界很差,因为边界没有在任何统计机器学习系统中明确建模。本文的重点是利用信号处理获得的声学线索来明确地建模短语边界。短语由一系列单词组成,其中每个单词又由一系列音节组成。使用信号处理检测音节边界。当与音节边界相匹配时,对应于话语的波形在短语边界处拼接。高斯混合模型-隐马尔可夫模型(GMM-HMM)训练是逐句进行,而不是逐句进行。使用这些短语进行训练可以产生更好的声学模型。然后将这种对齐馈送到DNN,以便更好地区分手机。在训练过程中,在每次迭代中恢复音节边界(通过信号处理获得)。相对于印度语言,即古吉拉特语、泰米尔语和泰卢固语,在WER中观察到相对改善。
{"title":"Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages","authors":"Arun Baby, S. KarthikPandiaD., H. Murthy","doi":"10.21437/SLTU.2018-6","DOIUrl":"https://doi.org/10.21437/SLTU.2018-6","url":null,"abstract":"Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional flat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children's Speech JAMLIT:用于儿童语音自动识别的牙买加标准英语语料库
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-51
Stefan Watson, André Coy
{"title":"JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children's Speech","authors":"Stefan Watson, André Coy","doi":"10.21437/SLTU.2018-51","DOIUrl":"https://doi.org/10.21437/SLTU.2018-51","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133130077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator 基于马拉地语语音计算器的低资源文本转语音系统级联技术实现
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-16
Monica R. Mundada, Sangramsing Kayte, P. Das
The indulgent acquaintance of mathematical basic concepts creates the pavement for numerous opportunities in life for every individual, including visually impaired people. The use of assertive technology for the disabled section of the society makes them more independent and avoid barriers in the field of education and employment. This research is focused to design an Android-based application i.e. talking Calculator for low resource based Marathi native language. The novelty of this work is to develop both, the application and the Marathi number corpus. Marathi is an Indo-Aryan language spoken by approximately 6.99 million speakers in India, which is the third widely spoken language after Bengali and Telugu but as they lack in linguistic resources, e.g. grammars, POS taggers, corpora, it falls into the category of low resource languages. The front end part of the application depicts the screen of a basic calculator with numerals displayed in Marathi. During runtime, each number is spoken as the specific key is pressed. It also speaks out the operation which is intended to be performed. The concatenation synthesis technique is applied to speak out the value of decimal places in the output number. The result is spoken out with proper place value of a digit in Marathi. The performance of the system is measured to the accuracy rate of 95.5%. The average run time complexity of the application is also calculated which is noted down to 2.64 sec. The feedback and review of the application is also taken from real end-user i.e. blind people.
对数学基本概念的深入了解为每个人的生活创造了无数的机会,包括视障人士。为社会的残疾阶层使用自信的技术使他们更加独立,避免了在教育和就业领域的障碍。本研究的重点是设计一个基于android的应用程序,即基于低资源的马拉地语母语的说话计算器。这项工作的新颖之处在于同时开发了应用程序和马拉地语数字语料库。马拉地语是一种印度-雅利安语言,在印度大约有699万人使用,是仅次于孟加拉语和泰卢固语的第三种广泛使用的语言,但由于缺乏语言资源,例如语法、词性标注器、语料库,马拉地语属于低资源语言。应用程序的前端部分描绘了一个基本计算器的屏幕,其中用马拉地语显示数字。在运行期间,当按下特定的键时,会说出每个数字。它还指出要执行的操作。采用串联合成技术读出输出数中小数点后的值。结果用马拉地语用正确的位值表示出来。经测试,该系统的准确率达到95.5%。应用程序的平均运行时间复杂度也被计算到2.64秒。应用程序的反馈和审查也来自真正的最终用户,即盲人。
{"title":"Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator","authors":"Monica R. Mundada, Sangramsing Kayte, P. Das","doi":"10.21437/SLTU.2018-16","DOIUrl":"https://doi.org/10.21437/SLTU.2018-16","url":null,"abstract":"The indulgent acquaintance of mathematical basic concepts creates the pavement for numerous opportunities in life for every individual, including visually impaired people. The use of assertive technology for the disabled section of the society makes them more independent and avoid barriers in the field of education and employment. This research is focused to design an Android-based application i.e. talking Calculator for low resource based Marathi native language. The novelty of this work is to develop both, the application and the Marathi number corpus. Marathi is an Indo-Aryan language spoken by approximately 6.99 million speakers in India, which is the third widely spoken language after Bengali and Telugu but as they lack in linguistic resources, e.g. grammars, POS taggers, corpora, it falls into the category of low resource languages. The front end part of the application depicts the screen of a basic calculator with numerals displayed in Marathi. During runtime, each number is spoken as the specific key is pressed. It also speaks out the operation which is intended to be performed. The concatenation synthesis technique is applied to speak out the value of decimal places in the output number. The result is spoken out with proper place value of a digit in Marathi. The performance of the system is measured to the accuracy rate of 95.5%. The average run time complexity of the application is also calculated which is noted down to 2.64 sec. The feedback and review of the application is also taken from real end-user i.e. blind people.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Networks-based Automatic Speech Recognition for Agricultural Commodity in Gujarati Language 基于神经网络的古吉拉特语农产品语音自动识别
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-34
Hardik B. Sailor, H. Patil
In this paper, we present a development of Automatic Speech Recognition (ASR) system as a part of a speech-based access for an agricultural commodity in the Gujarati (a low resource) language. We proposed to use neural networks for language modeling, acoustic modeling, and feature learning from the raw speech signals. The speech database of agricultural commodities was collected from the farmers belonging to various villages of Gujarat state (India). The database has various dialectal variations and real noisy acoustic environments. Acoustic modeling is performed using Time Delay Neural Networks (TDNN). The auditory feature representation is learned using Convolutional Restricted Boltzmann Machine (ConvRBM) and Teager Energy Operator (TEO). The language model (LM) rescoring is performed using Recurrent Neural Networks (RNN). RNNLM rescoring provides an absolute reduction of 0.69-1.18 in % WER for all the feature sets compared to the bi-gram LM. The system combination of ConvRBM and Mel filterbank further improved the performance of ASR compared to the baseline TDNN with Mel filterbank features (5.4 % relative reduction in WER). The statistical significance of proposed approach is justified using a bootstrap-based % Probability of Improvement (POI) measure.
在本文中,我们提出了一个自动语音识别(ASR)系统的开发,作为古吉拉特语(一种低资源)语言农产品的基于语音的访问的一部分。我们建议使用神经网络对原始语音信号进行语言建模、声学建模和特征学习。农产品语言数据库收集自印度古吉拉特邦各个村庄的农民。数据库有各种方言变化和真实的嘈杂声环境。声学建模采用延时神经网络(TDNN)。使用卷积受限玻尔兹曼机(ConvRBM)和Teager能量算子(TEO)学习听觉特征表示。语言模型(LM)评分采用递归神经网络(RNN)进行。与双图LM相比,RNNLM评分为所有特征集提供了0.69-1.18 %的绝对减少。与具有Mel滤波器组特征的基线TDNN相比,ConvRBM和Mel滤波器组的系统组合进一步提高了ASR的性能(相对降低了5.4%的WER)。采用基于自举的%改进概率(POI)度量来证明所提出方法的统计显著性。
{"title":"Neural Networks-based Automatic Speech Recognition for Agricultural Commodity in Gujarati Language","authors":"Hardik B. Sailor, H. Patil","doi":"10.21437/SLTU.2018-34","DOIUrl":"https://doi.org/10.21437/SLTU.2018-34","url":null,"abstract":"In this paper, we present a development of Automatic Speech Recognition (ASR) system as a part of a speech-based access for an agricultural commodity in the Gujarati (a low resource) language. We proposed to use neural networks for language modeling, acoustic modeling, and feature learning from the raw speech signals. The speech database of agricultural commodities was collected from the farmers belonging to various villages of Gujarat state (India). The database has various dialectal variations and real noisy acoustic environments. Acoustic modeling is performed using Time Delay Neural Networks (TDNN). The auditory feature representation is learned using Convolutional Restricted Boltzmann Machine (ConvRBM) and Teager Energy Operator (TEO). The language model (LM) rescoring is performed using Recurrent Neural Networks (RNN). RNNLM rescoring provides an absolute reduction of 0.69-1.18 in % WER for all the feature sets compared to the bi-gram LM. The system combination of ConvRBM and Mel filterbank further improved the performance of ASR compared to the baseline TDNN with Mel filterbank features (5.4 % relative reduction in WER). The statistical significance of proposed approach is justified using a bootstrap-based % Probability of Improvement (POI) measure.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114635431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Workshop on Spoken Language Technologies for Under-resourced Languages
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1