Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASR

Savitha Murthy, Dinkar Sitaram
{"title":"Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASR","authors":"Savitha Murthy, Dinkar Sitaram","doi":"10.1007/s12046-024-02520-0","DOIUrl":null,"url":null,"abstract":"<p>Automatic speech recognition systems for low-resource languages typically have smaller corpora on which the language model is trained. Decoding with such a language model leads to a high word error rate due to the large number of out-of-vocabulary words in the test data. Larger language models can be used to rescore the lattices generated from initial decoding. This approach, however, gives only a marginal improvement. Decoding with a larger augmented language model, though helpful, is memory intensive and not feasible for low resource system setup. The objective of our research is to perform initial decoding with a minimally augmented language model. The lattices thus generated are then rescored with a larger language model. We thus obtain a significant reduction in error for low-resource Indic languages, namely, Kannada and Telugu. This paper addresses the problem of improving speech recognition accuracy with lattice rescoring in low-resource languages where the baseline language model is not sufficient for generating inclusive lattices. We minimally augment the baseline language model with unigram counts of words that are present in a larger text corpus of the target language but absent in the baseline. The lattices generated after decoding with a minimally augmented baseline language model are more comprehensive for rescoring. We obtain 21.8% (for Telugu) and 41.8% (for Kannada) relative word error reduction with our proposed method. This reduction in word error rate is comparable to 21.5% (for Telugu) and 45.9% (for Kannada) relative word error reduction obtained by decoding with full Wikipedia text augmented language mode while our approach consumes only 1/8th the memory. We demonstrate that our method is comparable with various text selection-based language model augmentation and also consistent for data sets of different sizes. Our approach is applicable for training speech recognition systems under low resource conditions where speech data and compute resources are insufficient, while there is a large text corpus that is available in the target language. Our research involves addressing the issue of out-of-vocabulary words of the baseline in general and does not focus on resolving the absence of named entities. Our proposed method is simple and yet computationally less expensive.</p>","PeriodicalId":21498,"journal":{"name":"Sādhanā","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sādhanā","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12046-024-02520-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic speech recognition systems for low-resource languages typically have smaller corpora on which the language model is trained. Decoding with such a language model leads to a high word error rate due to the large number of out-of-vocabulary words in the test data. Larger language models can be used to rescore the lattices generated from initial decoding. This approach, however, gives only a marginal improvement. Decoding with a larger augmented language model, though helpful, is memory intensive and not feasible for low resource system setup. The objective of our research is to perform initial decoding with a minimally augmented language model. The lattices thus generated are then rescored with a larger language model. We thus obtain a significant reduction in error for low-resource Indic languages, namely, Kannada and Telugu. This paper addresses the problem of improving speech recognition accuracy with lattice rescoring in low-resource languages where the baseline language model is not sufficient for generating inclusive lattices. We minimally augment the baseline language model with unigram counts of words that are present in a larger text corpus of the target language but absent in the baseline. The lattices generated after decoding with a minimally augmented baseline language model are more comprehensive for rescoring. We obtain 21.8% (for Telugu) and 41.8% (for Kannada) relative word error reduction with our proposed method. This reduction in word error rate is comparable to 21.5% (for Telugu) and 45.9% (for Kannada) relative word error reduction obtained by decoding with full Wikipedia text augmented language mode while our approach consumes only 1/8th the memory. We demonstrate that our method is comparable with various text selection-based language model augmentation and also consistent for data sets of different sizes. Our approach is applicable for training speech recognition systems under low resource conditions where speech data and compute resources are insufficient, while there is a large text corpus that is available in the target language. Our research involves addressing the issue of out-of-vocabulary words of the baseline in general and does not focus on resolving the absence of named entities. Our proposed method is simple and yet computationally less expensive.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用最小增强语言模型进行初始解码,改进低资源 ASR 中的网格重构
低资源语言的自动语音识别系统通常使用较小的语料库来训练语言模型。使用这种语言模型进行解码时,由于测试数据中存在大量词汇表以外的词汇,因此会导致较高的词汇错误率。较大的语言模型可用于对初始解码生成的网格重新评分。然而,这种方法只能带来微不足道的改进。使用更大的增强语言模型进行解码虽然有帮助,但需要大量内存,对于资源较少的系统设置来说并不可行。我们的研究目标是使用最小增强语言模型进行初始解码。然后使用更大的语言模型对由此生成的网格进行重新编码。因此,我们大大减少了低资源印度语言(即卡纳达语和泰卢固语)的错误。在低资源语言中,基线语言模型不足以生成包含性网格,本文解决了通过网格重构提高语音识别准确率的问题。我们使用目标语言较大文本语料库中存在但基准语料库中不存在的单字格计数,对基准语言模型进行最小化增强。使用经最小化增强的基线语言模型解码后生成的网格在重新评分时更为全面。使用我们提出的方法,相对单词错误率分别降低了 21.8%(泰卢固语)和 41.8%(卡纳达语)。词错误率的降低幅度与使用维基百科全文增强语言模式解码获得的 21.5%(泰卢固语)和 45.9%(坎纳达语)的相对词错误降低幅度相当,而我们的方法仅消耗 1/8 的内存。我们证明,我们的方法可与各种基于文本选择的语言模型增强方法相媲美,而且适用于不同规模的数据集。我们的方法适用于在低资源条件下训练语音识别系统,在这种条件下,语音数据和计算资源不足,而目标语言有大量的文本语料库。我们的研究涉及解决基线词汇量不足的问题,并不侧重于解决命名实体缺失的问题。我们提出的方法简单,但计算成本较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Buckling performance optimization of sub-stiffened composite panels with straight and curvilinear sub-stiffeners Transformer-based Pouranic topic classification in Indian mythology Influence of non-stoichiometric solutions on the THF hydrate growth: chemical affinity modelling and visualization Development and analysis of Hastelloy-X alloy butt joint made by laser beam welding Comparative analysis of a remotely-controlled wetland paddy seeder and conventional drum seeder
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1