Construction of Language Models for Uzbek Language

N. Mamatov, N. Niyozmatova, A. Samijonov, B. Samijonov
{"title":"Construction of Language Models for Uzbek Language","authors":"N. Mamatov, N. Niyozmatova, A. Samijonov, B. Samijonov","doi":"10.1109/ICISCT55600.2022.10146788","DOIUrl":null,"url":null,"abstract":"A language model is a set of restrictions on the sequence of words allowed in a given language, and these restrictions can be expressed, for example, by the rules of a generative grammar or by a statistic of each pair of words evaluated in a given language. simple educational building. Although there are words with similar-sounding phonemes, it is usually not difficult for people to recognize the word. It mostly has to do with knowing the context and being very good at what words or phrases might be in it. The purpose of the language model is to provide context to the speech recognition system. The language model determines what words are allowed in the system language and in what order they can occur.Language models are trained, i.e., n-gram probabilities are estimated by observing sequences of words in a text corpus. Confusion reduction is typically performed on training data containing millions of word tokens. But, as has been observed, reducing confusion does not improve speech recognition results. Therefore, algorithms should be used that improve language models in terms of their impact on speech recognition, especially language models that determine the probability distribution of the speaker’s next spoken words given the speech history.In recent years, many speech recognition systems have been developed that use language models created for specific languages. And the use of language models in speech recognition serves to increase the efficiency of speech recognition. Many researchers have developed a traditional language model for the Uzbek language [8] –[12], but this model does not give the expected results. This requires the construction of other models for the Uzbek language. This article provides information about natural language, building natural language models, and applying them to speech recognition. Discusses research related to the construction of natural language models, problems that arise in the construction of statistical models, and approaches that can be used to solve them.","PeriodicalId":332984,"journal":{"name":"2022 International Conference on Information Science and Communications Technologies (ICISCT)","volume":"513 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Information Science and Communications Technologies (ICISCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISCT55600.2022.10146788","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A language model is a set of restrictions on the sequence of words allowed in a given language, and these restrictions can be expressed, for example, by the rules of a generative grammar or by a statistic of each pair of words evaluated in a given language. simple educational building. Although there are words with similar-sounding phonemes, it is usually not difficult for people to recognize the word. It mostly has to do with knowing the context and being very good at what words or phrases might be in it. The purpose of the language model is to provide context to the speech recognition system. The language model determines what words are allowed in the system language and in what order they can occur.Language models are trained, i.e., n-gram probabilities are estimated by observing sequences of words in a text corpus. Confusion reduction is typically performed on training data containing millions of word tokens. But, as has been observed, reducing confusion does not improve speech recognition results. Therefore, algorithms should be used that improve language models in terms of their impact on speech recognition, especially language models that determine the probability distribution of the speaker’s next spoken words given the speech history.In recent years, many speech recognition systems have been developed that use language models created for specific languages. And the use of language models in speech recognition serves to increase the efficiency of speech recognition. Many researchers have developed a traditional language model for the Uzbek language [8] –[12], but this model does not give the expected results. This requires the construction of other models for the Uzbek language. This article provides information about natural language, building natural language models, and applying them to speech recognition. Discusses research related to the construction of natural language models, problems that arise in the construction of statistical models, and approaches that can be used to solve them.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
乌兹别克语语言模型的构建
语言模型是对给定语言中允许的单词序列的一组限制,这些限制可以表示,例如,通过生成语法的规则或通过给定语言中评估的每对单词的统计量。简单的教育建筑。虽然有些单词的音素发音相似,但人们通常并不难识别这些单词。它主要与了解上下文以及非常擅长其中可能出现的单词或短语有关。语言模型的目的是为语音识别系统提供上下文。语言模型确定系统语言中允许使用哪些单词,以及它们出现的顺序。语言模型被训练,即通过观察文本语料库中的单词序列来估计n-gram概率。减少混淆通常在包含数百万个单词标记的训练数据上执行。但是,正如已经观察到的那样,减少混淆并不能改善语音识别结果。因此,应该使用算法来改进语言模型对语音识别的影响,特别是在给定语音历史的情况下,确定说话人下一个说话词的概率分布的语言模型。近年来,许多语音识别系统都使用了为特定语言创建的语言模型。语言模型在语音识别中的应用有助于提高语音识别的效率。许多研究人员已经为乌兹别克语开发了一个传统的语言模型[8]-[12],但是这个模型并没有给出预期的结果。这就需要为乌兹别克语建立其他模型。本文提供了有关自然语言、构建自然语言模型以及将其应用于语音识别的信息。讨论了与自然语言模型构建相关的研究,统计模型构建中出现的问题,以及可用于解决这些问题的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Self heating and DIBL effects in 2D MoS2 based MOSFET with different gate oxide and back oxide materials Memristors: types, characteristics and prospects of use as the main element of the future artificial intelligence An algorithm for parallel processing of traffic signs video on a graphics processor Nonlinear transformations of different type features and the choice of latent space based on them 2D Adiabatic CA Rules over ℤp
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1