“Spanish Políglota”: an automatic Speech Recognition system based on HMM

Jonathan A. Zea, Josafá Aguiar
{"title":"“Spanish Políglota”: an automatic Speech Recognition system based on HMM","authors":"Jonathan A. Zea, Josafá Aguiar","doi":"10.1109/ICI2ST51859.2021.00011","DOIUrl":null,"url":null,"abstract":"The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the prediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct predictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker’s speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.","PeriodicalId":148844,"journal":{"name":"2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Second International Conference on Information Systems and Software Technologies (ICI2ST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICI2ST51859.2021.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The goal of this ASR system is to be able to recognize audio queries that request static translation of a given Spanish word into a specified language. We call this ASR system as the Spanish Políglota. The pronunciation dictionary for the language model is obtained by applying grapheme to phoneme conversion. It was developed via Festival Speech Synthesis Scheme scripts and the SPPAS Spanish lexicon. The possible audio queries are restricted by a BNF grammar we designed for this project. A triphone acoustic model was generated from a set of 1621 words audio recordings. This acoustic model is based on a N-gram model that estimates its probabilities based on the maximum likelihood estimation MLE. We evaluated the prediction of individual words, as well as of synthetic phrases. We generated 1577 synthetic phrases concatenating the words of our audio set. The performance was also measured over a new set of audio recordings from a different speaker. Evaluation of isolated word recognition achieved 77.91% of correct predictions. Nevertheless, the performance dropped when evaluating the synthetic phrases as well as the second speaker’s speech. We consider it is an initial step towards the development of a fully functional automatic speech recognition system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
“西班牙语Políglota”:基于HMM的自动语音识别系统
这个ASR系统的目标是能够识别请求将给定的西班牙语单词静态翻译成指定语言的音频查询。我们称这个ASR系统为西班牙语Políglota。将字素转换为音素,得到语言模型的发音字典。它是通过节日语音合成方案脚本和SPPAS西班牙语词典开发的。可能的音频查询受到我们为这个项目设计的BNF语法的限制。从一组1621个单词的录音中生成了一个三联音声学模型。该声学模型基于N-gram模型,该模型基于最大似然估计MLE估计其概率。我们评估了对单个单词和合成短语的预测。我们生成了1577个合成短语,将音频集的单词连接起来。这种表现也通过一组来自不同扬声器的新录音进行了测量。孤立词识别的评估准确率达到77.91%。然而,在评估合成短语和第二个说话者的演讲时,表现有所下降。我们认为这是迈向全功能自动语音识别系统的第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Empirical Strategies in Software Engineering Research: A Literature Survey A Case Study: Developing reusable Learning Objects Constraining Interseismic Deformation of Northern Ecuador using Interferometry from Sentinel-1 Data Glaucoma detection through digital processing from fundus images using MATLAB A First Spotlight: Introducing Educational Robotics in the Ecuadorian Public School
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1