Multi Rule-based and Corpus-based for Sundanese Stemmer

A. Sutedi, M. Nasrulloh, Rickard Elsen
{"title":"Multi Rule-based and Corpus-based for Sundanese Stemmer","authors":"A. Sutedi, M. Nasrulloh, Rickard Elsen","doi":"10.15575/join.v7i2.846","DOIUrl":null,"url":null,"abstract":"The purpose of this study is to develop a stemming method by involved several methods including morphological (with affix and pro-lexeme removal), syllable (canonical) pattern, and corpus data as a comparison of the final results of stemming. The algorithm checks a number of the string first and removes affixes, then check the syllable pattern according to the stripping result, then compares to the corpus data which determines the final stemming process. In this study, the corpus data was taken from Sundanese dictionary consists of a single word used for the root word and the extracted dataset from the online Sundanese magazine. The results showed that the stripping of affix and pro-lexeme can remove the corresponding affixes and pro-lexeme then compares words that have a syllable pattern then executes the basic words quickly and the use of corpus can improve accuracy and reduce the over-stemming problems that occur in the stemming process.","PeriodicalId":32019,"journal":{"name":"JOIN Jurnal Online Informatika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOIN Jurnal Online Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15575/join.v7i2.846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The purpose of this study is to develop a stemming method by involved several methods including morphological (with affix and pro-lexeme removal), syllable (canonical) pattern, and corpus data as a comparison of the final results of stemming. The algorithm checks a number of the string first and removes affixes, then check the syllable pattern according to the stripping result, then compares to the corpus data which determines the final stemming process. In this study, the corpus data was taken from Sundanese dictionary consists of a single word used for the root word and the extracted dataset from the online Sundanese magazine. The results showed that the stripping of affix and pro-lexeme can remove the corresponding affixes and pro-lexeme then compares words that have a syllable pattern then executes the basic words quickly and the use of corpus can improve accuracy and reduce the over-stemming problems that occur in the stemming process.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多规则和语料库的Sundanese词干分析
本研究的目的是开发一种词干提取方法,涉及多种方法,包括形态学(词缀和词素前去除)、音节(规范)模式和语料库数据,作为词干提取的最终结果的比较。该算法首先对字符串进行抽查并去除词缀,然后根据抽查结果对音节模式进行抽查,最后与语料库数据进行比对,确定最终的词干提取过程。在本研究中,语料库数据取自Sundanese词典,其中包含一个用于词根词的单词,以及从在线Sundanese杂志中提取的数据集。结果表明,词缀和前词素的剥离可以去除相应的词缀和前词素,然后对具有音节模式的单词进行比较,然后快速执行基本单词,使用语料库可以提高词干的准确性,减少词干过程中出现的过度词干问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
2
审稿时长
12 weeks
期刊最新文献
Malware Image Classification Using Deep Learning InceptionResNet-V2 and VGG-16 Method Texture Analysis of Citrus Leaf Images Using BEMD for Huanglongbing Disease Diagnosis Implementation of Ant Colony Optimization – Artificial Neural Network in Predicting the Activity of Indenopyrazole Derivative as Anti-Cancer Agent The Implementation of Restricted Boltzmann Machine in Choosing a Specialization for Informatics Students Digital Image Processing Using YCbCr Colour Space and Neuro Fuzzy to Identify Pornography
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1