Development online models for automatic speech recognition systems with a low data level

ОZh Mamyrbayev, D. Oralbekova, K. Alimhan, M. Othman, B. Zhumazhanov
{"title":"Development online models for automatic speech recognition systems with a low data level","authors":"ОZh Mamyrbayev, D. Oralbekova, K. Alimhan, M. Othman, B. Zhumazhanov","doi":"10.17352/amp.000049","DOIUrl":null,"url":null,"abstract":"Speech recognition is a rapidly growing field in machine learning. Conventional automatic speech recognition systems were built based on independent components, that is an acoustic model, a language model and a vocabulary, which were tuned and trained separately. The acoustic model is used to predict the context-dependent states of phonemes, and the language model and lexicon determine the most possible sequences of spoken phrases. The development of deep learning technologies has contributed to the improvement of other scientific areas, which includes speech recognition. Today, the most popular speech recognition systems are systems based on an end-to-end (E2E) structure, which trains the components of a traditional model simultaneously without isolating individual elements, representing the system as a single neural network. The E2E structure represents the system as one whole element, in contrast to the traditional one, which has several independent elements. The E2E system provides a direct mapping of acoustic signals in a sequence of labels without intermediate states, without the need for post-processing at the output, which makes it easy to implement. Today, the popular models are those that directly output the sequence of words based on the input sound in real-time, which are online end-to-end models. This article provides a detailed overview of popular online-based models for E2E systems such as RNN-T, Neural Transducer (NT) and Monotonic Chunkwise Attention (MoChA). It should be emphasized that online models for Kazakh speech recognition have not been developed at the moment. For low-resource languages, like the Kazakh language, the above models have not been studied. Thus, systems based on these models have been trained to recognize Kazakh speech. The results obtained showed that all three models work well for recognizing Kazakh speech without the use of external additions.","PeriodicalId":430514,"journal":{"name":"Annals of Mathematics and Physics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17352/amp.000049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech recognition is a rapidly growing field in machine learning. Conventional automatic speech recognition systems were built based on independent components, that is an acoustic model, a language model and a vocabulary, which were tuned and trained separately. The acoustic model is used to predict the context-dependent states of phonemes, and the language model and lexicon determine the most possible sequences of spoken phrases. The development of deep learning technologies has contributed to the improvement of other scientific areas, which includes speech recognition. Today, the most popular speech recognition systems are systems based on an end-to-end (E2E) structure, which trains the components of a traditional model simultaneously without isolating individual elements, representing the system as a single neural network. The E2E structure represents the system as one whole element, in contrast to the traditional one, which has several independent elements. The E2E system provides a direct mapping of acoustic signals in a sequence of labels without intermediate states, without the need for post-processing at the output, which makes it easy to implement. Today, the popular models are those that directly output the sequence of words based on the input sound in real-time, which are online end-to-end models. This article provides a detailed overview of popular online-based models for E2E systems such as RNN-T, Neural Transducer (NT) and Monotonic Chunkwise Attention (MoChA). It should be emphasized that online models for Kazakh speech recognition have not been developed at the moment. For low-resource languages, like the Kazakh language, the above models have not been studied. Thus, systems based on these models have been trained to recognize Kazakh speech. The results obtained showed that all three models work well for recognizing Kazakh speech without the use of external additions.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发低数据水平自动语音识别系统的在线模型
语音识别是机器学习中一个快速发展的领域。传统的自动语音识别系统是基于声学模型、语言模型和词汇表这三个独立的组成部分,分别进行调谐和训练。声学模型用于预测音素的上下文依赖状态,语言模型和词汇确定最可能的口语短语序列。深度学习技术的发展促进了其他科学领域的进步,其中包括语音识别。今天,最流行的语音识别系统是基于端到端(E2E)结构的系统,它同时训练传统模型的组成部分,而不隔离单个元素,将系统表示为单个神经网络。端到端结构将系统表示为一个整体元素,而传统结构则有几个独立的元素。E2E系统在没有中间状态的标签序列中提供声学信号的直接映射,不需要在输出处进行后处理,这使得它易于实现。今天,流行的模型是那些直接根据输入声音实时输出单词序列的模型,这是在线端到端模型。本文详细概述了流行的基于在线的端到端系统模型,如RNN-T、神经换能器(NT)和单调块注意(MoChA)。需要强调的是,目前哈萨克语语音识别的在线模型还没有开发出来。对于资源匮乏的语言,如哈萨克语,上述模型尚未得到研究。因此,基于这些模型的系统已经被训练来识别哈萨克语语音。结果表明,在不使用外部添加物的情况下,这三个模型都能很好地识别哈萨克语语音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Coefficient estimates for a subclass of bi-univalent functions associated with the Salagean differential operator Abundant dynamical solitary waves solutions of M -fractional Oskolkov model Successive differentiation of some mathematical functions using hypergeometric mechanism Calculation of the influence of the entropy of stars on the Earth's exosphere and the theory of entropic gravity Study of the effect of multiple phase transformations and relaxation annealing on the microstructure of a martensitic TiNi alloy in different structural states
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1