Optimizing Clustering of Indonesian Text Data Using Particle Swarm Optimization Algorithm: A Case Study of the Quran Translation

M. D. R. Wahyudi, Agung Fatwanto
{"title":"Optimizing Clustering of Indonesian Text Data Using Particle Swarm Optimization Algorithm: A Case Study of the Quran Translation","authors":"M. D. R. Wahyudi, Agung Fatwanto","doi":"10.35671/telematika.v17i1.2724","DOIUrl":null,"url":null,"abstract":"The Quran considered the holy book for Muslims, contains scientific and historical facts affirming Islam's truth, beauty, and influence on human life. Consequently, the Quran text and its translations are valuable sources for text mining research, particularly for studying the interrelationship of its verses. One approach to grouping objects using certain algorithms is clustering, with K-Means Clustering being a prominent example. However, clustering results are often suboptimal due to the random selection of centroids. To address this, the study proposes using the Particle Swarm Optimization (PSO) algorithm, which selects centroids based on PSO results. The hybrid PSO algorithm initiates a single iteration of the K-means algorithm. It concludes either upon reaching the maximum iteration limit or when the average shift in the center of the mass vector falls below 0.0001. Evaluation of the clustering results from the three models indicates that the K-Means algorithm produced the lowest Sum of Squared Error (SSE) value of 1032.19. Additionally, the hybrid PSO algorithm generated the highest Silhouette value of 0.258 and the lowest quantization value of 0.00947. Further evaluation using a confusion matrix showed that K-Means clustering had an accuracy rate of 81.7%, K-Means with PSO had 82.5%, and the combination of K-Means with hybrid PSO yielded the highest accuracy rate of 91.1% among the three grouping model.","PeriodicalId":31716,"journal":{"name":"Telematika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35671/telematika.v17i1.2724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Quran considered the holy book for Muslims, contains scientific and historical facts affirming Islam's truth, beauty, and influence on human life. Consequently, the Quran text and its translations are valuable sources for text mining research, particularly for studying the interrelationship of its verses. One approach to grouping objects using certain algorithms is clustering, with K-Means Clustering being a prominent example. However, clustering results are often suboptimal due to the random selection of centroids. To address this, the study proposes using the Particle Swarm Optimization (PSO) algorithm, which selects centroids based on PSO results. The hybrid PSO algorithm initiates a single iteration of the K-means algorithm. It concludes either upon reaching the maximum iteration limit or when the average shift in the center of the mass vector falls below 0.0001. Evaluation of the clustering results from the three models indicates that the K-Means algorithm produced the lowest Sum of Squared Error (SSE) value of 1032.19. Additionally, the hybrid PSO algorithm generated the highest Silhouette value of 0.258 and the lowest quantization value of 0.00947. Further evaluation using a confusion matrix showed that K-Means clustering had an accuracy rate of 81.7%, K-Means with PSO had 82.5%, and the combination of K-Means with hybrid PSO yielded the highest accuracy rate of 91.1% among the three grouping model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用粒子群优化算法优化印度尼西亚文本数据的聚类:古兰经》翻译案例研究
古兰经》被视为穆斯林的圣书,其中包含的科学和历史事实肯定了伊斯兰教的真善美以及对人类生活的影响。因此,《古兰经》文本及其译本是文本挖掘研究的宝贵资料,尤其是在研究其经文的相互关系方面。使用某些算法对对象进行分组的一种方法是聚类,K-Means 聚类就是一个突出的例子。然而,由于中心点的随机选择,聚类结果往往不够理想。为解决这一问题,研究建议使用粒子群优化(PSO)算法,该算法根据 PSO 结果选择中心点。混合 PSO 算法对 K-means 算法进行一次迭代。当达到最大迭代限制或质量向量中心的平均移动量低于 0.0001 时,迭代结束。对三种模型聚类结果的评估表明,K-Means 算法产生的平方误差总和(SSE)值最低,为 1032.19。此外,混合 PSO 算法产生的剪影值最高,为 0.258,量化值最低,为 0.00947。使用混淆矩阵进行的进一步评估显示,K-Means 聚类的准确率为 81.7%,K-Means 与 PSO 算法的准确率为 82.5%,而 K-Means 与混合 PSO 算法的组合在三种分组模型中准确率最高,达到 91.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
7
审稿时长
24 weeks
期刊最新文献
Identification of Social Media Posts Containing Self-reported COVID-19 Symptoms using Triple Word Embeddings and Long Short-Term Memory Deep Learning for Histopathological Image Analysis: A Convolutional Neural Network Approach to Colon Cancer Classification Comparative Analysis of Classification Methods in Sentiment Analysis: The Impact of Feature Selection and Ensemble Techniques Optimization Optimizing Clustering of Indonesian Text Data Using Particle Swarm Optimization Algorithm: A Case Study of the Quran Translation Monitoring Development Board based on InfluxDB and Grafana
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1