SymSpell4Burmese:用于缅甸语拼写检查的对称删除拼写纠正算法(SymSpell)

Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu, Aye Wai Oo
{"title":"SymSpell4Burmese:用于缅甸语拼写检查的对称删除拼写纠正算法(SymSpell)","authors":"Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu, Aye Wai Oo","doi":"10.1109/iSAI-NLP54397.2021.9678171","DOIUrl":null,"url":null,"abstract":"Spell checker is a crucial language tool of natural language processing (NLP) and becomes important due to the increase of text-based communication at work, information retrieval, fraud detection, search engines, social media and research areas. In this paper, automatic spelling checking for Burmese is studied by applying Symmetric Delete Spelling Correction Algorithm (SymSpell). We experimented by using an open source SymSpell python library and applied our developing Burmese spelling training corpus together with four frequency dictionaries on ten error types. For the error detection phase, the N-gram language model is used to check our developing spelling training corpus against a dictionary. For the correction phrase, SymSpell is applied to propose candidate corrections within a specified maximum edit distance from the misspelled word. After generating candidates, the best correction in the given context is automatically chosen according to the highest frequency with a minimum edit distance. We investigated the performance of each error type and studied the importance of the dictionary depending on the average term length and maximum edit distance for Burmese spell checker based on SymSpell. Moreover, we observed that syllable level segmentation with a maximum edit distance of 3 gives faster and higher quality results compared with word level segmentation results.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SymSpell4Burmese: Symmetric Delete Spelling Correction Algorithm (SymSpell) for Burmese Spelling Checking\",\"authors\":\"Ei Phyu Phyu Mon, Ye Kyaw Thu, Than Than Yu, Aye Wai Oo\",\"doi\":\"10.1109/iSAI-NLP54397.2021.9678171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spell checker is a crucial language tool of natural language processing (NLP) and becomes important due to the increase of text-based communication at work, information retrieval, fraud detection, search engines, social media and research areas. In this paper, automatic spelling checking for Burmese is studied by applying Symmetric Delete Spelling Correction Algorithm (SymSpell). We experimented by using an open source SymSpell python library and applied our developing Burmese spelling training corpus together with four frequency dictionaries on ten error types. For the error detection phase, the N-gram language model is used to check our developing spelling training corpus against a dictionary. For the correction phrase, SymSpell is applied to propose candidate corrections within a specified maximum edit distance from the misspelled word. After generating candidates, the best correction in the given context is automatically chosen according to the highest frequency with a minimum edit distance. We investigated the performance of each error type and studied the importance of the dictionary depending on the average term length and maximum edit distance for Burmese spell checker based on SymSpell. Moreover, we observed that syllable level segmentation with a maximum edit distance of 3 gives faster and higher quality results compared with word level segmentation results.\",\"PeriodicalId\":339826,\"journal\":{\"name\":\"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSAI-NLP54397.2021.9678171\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

拼写检查器是自然语言处理(NLP)的重要语言工具,随着工作、信息检索、欺诈检测、搜索引擎、社交媒体和研究领域中基于文本的交流的增加,拼写检查器变得越来越重要。本文采用对称删除拼写校正算法(SymSpell)对缅甸语的自动拼写检查进行了研究。我们使用开源的SymSpell python库进行实验,并将我们开发的缅甸语拼写训练语料库与十个错误类型的四个频率字典一起应用。在错误检测阶段,使用N-gram语言模型对照字典检查我们正在开发的拼写训练语料库。对于更正短语,应用SymSpell在与拼写错误的单词指定的最大编辑距离内提出候选更正。在生成候选项后,根据最高频率和最小编辑距离自动选择给定上下文中的最佳校正。我们调查了每种错误类型的性能,并研究了基于SymSpell的缅甸语拼写检查器的平均词长和最大编辑距离对字典的重要性的影响。此外,我们观察到,与词级分词结果相比,最大编辑距离为3的音节级分词结果更快,质量更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SymSpell4Burmese: Symmetric Delete Spelling Correction Algorithm (SymSpell) for Burmese Spelling Checking
Spell checker is a crucial language tool of natural language processing (NLP) and becomes important due to the increase of text-based communication at work, information retrieval, fraud detection, search engines, social media and research areas. In this paper, automatic spelling checking for Burmese is studied by applying Symmetric Delete Spelling Correction Algorithm (SymSpell). We experimented by using an open source SymSpell python library and applied our developing Burmese spelling training corpus together with four frequency dictionaries on ten error types. For the error detection phase, the N-gram language model is used to check our developing spelling training corpus against a dictionary. For the correction phrase, SymSpell is applied to propose candidate corrections within a specified maximum edit distance from the misspelled word. After generating candidates, the best correction in the given context is automatically chosen according to the highest frequency with a minimum edit distance. We investigated the performance of each error type and studied the importance of the dictionary depending on the average term length and maximum edit distance for Burmese spell checker based on SymSpell. Moreover, we observed that syllable level segmentation with a maximum edit distance of 3 gives faster and higher quality results compared with word level segmentation results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Replay Attack Detection in Automatic Speaker Verification Based on ResNeWt18 with Linear Frequency Cepstral Coefficients Image Processing for Classification of Rice Varieties with Deep Convolutional Neural Networks KaleCare: Smart Farm for Kale with Pests Detection System using Machine Learning The comparison of the proposed recommended system with actual data sylbreak4all: Regular Expressions for Syllable Breaking of Nine Major Ethnic Languages of Myanmar
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1