基于规则的马来语e-Khutbah中以字母P开头的马来词根词提取

Nurhilyana Anuar, Zamri Abu Bakar, Normaly Kamal Ismail
{"title":"基于规则的马来语e-Khutbah中以字母P开头的马来词根词提取","authors":"Nurhilyana Anuar, Zamri Abu Bakar, Normaly Kamal Ismail","doi":"10.15282/ijsecs.9.1.2023.4.0108","DOIUrl":null,"url":null,"abstract":"Stemming is an important process in text processing especially in Natural Language Processing (NLP). It could extract root word from the affix words in the text. In addition, it helps in extracting useful information that contributes to many area of research study such as Information Retrieval. Several stemming algorithms have been discussed in previous studies. However, there are limited studies on Malay stemming process and the number of experimental data used. In this study, we focus on stemming process of Malay stemming algorithm by using rule-based algorithm for a larger dataset of Malay language text. The syntactic linguistic rule-based method was used in the stemming process involves of removing prefixes, suffixes and, prefixes and suffixes. Training dataset was used in this study which consisted of 3233 sentences from e-khutbah text. The result of the experimental evaluation was done by measuring the precision, recall and f-measure. It was found that the algorithm used in this study showed a promising result based on total of dataset used for each test. The value of precision, recall and F-measure increase to 95%, 97% and 97% respectively. The enhancement of the stemming process has shown a significant impact on Malay text processing which in general improved the performance of NLP applications.","PeriodicalId":31240,"journal":{"name":"International Journal of Software Engineering and Computer Systems","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extraction of Malay Root Word that Starts with Letter P in Malay e-Khutbah using Rule Based\",\"authors\":\"Nurhilyana Anuar, Zamri Abu Bakar, Normaly Kamal Ismail\",\"doi\":\"10.15282/ijsecs.9.1.2023.4.0108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stemming is an important process in text processing especially in Natural Language Processing (NLP). It could extract root word from the affix words in the text. In addition, it helps in extracting useful information that contributes to many area of research study such as Information Retrieval. Several stemming algorithms have been discussed in previous studies. However, there are limited studies on Malay stemming process and the number of experimental data used. In this study, we focus on stemming process of Malay stemming algorithm by using rule-based algorithm for a larger dataset of Malay language text. The syntactic linguistic rule-based method was used in the stemming process involves of removing prefixes, suffixes and, prefixes and suffixes. Training dataset was used in this study which consisted of 3233 sentences from e-khutbah text. The result of the experimental evaluation was done by measuring the precision, recall and f-measure. It was found that the algorithm used in this study showed a promising result based on total of dataset used for each test. The value of precision, recall and F-measure increase to 95%, 97% and 97% respectively. The enhancement of the stemming process has shown a significant impact on Malay text processing which in general improved the performance of NLP applications.\",\"PeriodicalId\":31240,\"journal\":{\"name\":\"International Journal of Software Engineering and Computer Systems\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Software Engineering and Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15282/ijsecs.9.1.2023.4.0108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Software Engineering and Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15282/ijsecs.9.1.2023.4.0108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

词干提取是文本处理特别是自然语言处理(NLP)中的重要过程。它可以从文本中的词缀词中提取词根。此外,它有助于提取有用的信息,有助于许多领域的研究,如信息检索。在以前的研究中已经讨论了几种词干提取算法。然而,对马来语词干过程的研究和使用的实验数据数量有限。在本研究中,我们重点研究马来语词干提取算法的词干提取过程,并使用基于规则的算法对一个更大的马来语文本数据集进行分析。在词干提取过程中采用了基于句法语言学规则的方法,包括去除前缀、后缀和前缀后缀。本研究使用的训练数据集由来自e-khutbah文本的3233个句子组成。通过测量查全率、查全率和f-测度对实验结果进行了评价。研究发现,基于每个测试使用的数据集总数,本研究中使用的算法显示出令人满意的结果。精密度、召回率和f测量值分别提高到95%、97%和97%。词干提取过程的增强对马来语文本处理产生了重大影响,总体上提高了NLP应用程序的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Extraction of Malay Root Word that Starts with Letter P in Malay e-Khutbah using Rule Based
Stemming is an important process in text processing especially in Natural Language Processing (NLP). It could extract root word from the affix words in the text. In addition, it helps in extracting useful information that contributes to many area of research study such as Information Retrieval. Several stemming algorithms have been discussed in previous studies. However, there are limited studies on Malay stemming process and the number of experimental data used. In this study, we focus on stemming process of Malay stemming algorithm by using rule-based algorithm for a larger dataset of Malay language text. The syntactic linguistic rule-based method was used in the stemming process involves of removing prefixes, suffixes and, prefixes and suffixes. Training dataset was used in this study which consisted of 3233 sentences from e-khutbah text. The result of the experimental evaluation was done by measuring the precision, recall and f-measure. It was found that the algorithm used in this study showed a promising result based on total of dataset used for each test. The value of precision, recall and F-measure increase to 95%, 97% and 97% respectively. The enhancement of the stemming process has shown a significant impact on Malay text processing which in general improved the performance of NLP applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
14
期刊最新文献
The Mobile Augmented Reality Application for Improving Learning of Electronic Component Module in TVET A Systematic Mapping on Android-based Platform for Smart Inventory System Sentiment Classification of Tweets with Explicit Word Negations and Emoji Using Deep Learning Protocol Efficiency Using Multiple Level Encoding in Quantum Secure Direct Communication Protocol SECURING IOT HEALTHCARE APPLICATIONS AND BLOCKCHAIN: ADDRESSING SECURITY ATTACKS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1