Comparative Analysis of Text Mining Classification Algorithms for English and Indonesian Qur’an Translation

IJID International Journal on Informatics for Development Pub Date : 2019-06-22 DOI:10.14421/IJID.2019.08108

R. Hidayat, Sekar Minati

{"title":"Comparative Analysis of Text Mining Classification Algorithms for English and Indonesian Qur’an Translation","authors":"R. Hidayat, Sekar Minati","doi":"10.14421/IJID.2019.08108","DOIUrl":null,"url":null,"abstract":"Qur'an, As-Sunnah, and Islamic old book have become the sources for Islam followers as sources of knowledge, wisdom, and law. But in daily life, there are still many Muslims who do not understand the meaning of the sentence in the Qur'an even though they read it every day. It becomes a challenge for Science and Engineering field academicians especially Informatics to explore and represent knowledge through intelligent system computing to answer various questions based on knowledge from the Qur'an. This research is creating an enabling computational environment for text mining the Qur'an, of which purpose is to facilitate people to understand each verse in the Qur'an. The classification experiment uses Support Vector Machine (SVM), Naive Bayes, k-Nearest Neighbor (kNN), and J48 Decision Tree classifier algorithms with Al-Baqarah verses translated to English and Indonesian as the dataset which was labeled by three most fundamental aspects of Islam: 'Iman' (faith), 'Ibadah' (worship), and 'Akhlaq' (virtues). Indonesian translation was processed by using the sastrawi package in Python to do the pre-processing and StringToWord Vector in WEKA with the TF-IDF method to implement the algorithms. The classification experiments are determined to measure accuracy, and f-measure, it tested with a percentage split 66% as the data training and the rest as the data testing. The decision from an experiment that was carried out by the classification results, SVM classifier algorithms have the overall best accuracy performance for the Indonesian translation of 81.443% and the Naïve Bayes classifier has the best accuracy for the English translation, which achieved 78.35%.","PeriodicalId":33558,"journal":{"name":"IJID International Journal on Informatics for Development","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJID International Journal on Informatics for Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14421/IJID.2019.08108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Qur'an, As-Sunnah, and Islamic old book have become the sources for Islam followers as sources of knowledge, wisdom, and law. But in daily life, there are still many Muslims who do not understand the meaning of the sentence in the Qur'an even though they read it every day. It becomes a challenge for Science and Engineering field academicians especially Informatics to explore and represent knowledge through intelligent system computing to answer various questions based on knowledge from the Qur'an. This research is creating an enabling computational environment for text mining the Qur'an, of which purpose is to facilitate people to understand each verse in the Qur'an. The classification experiment uses Support Vector Machine (SVM), Naive Bayes, k-Nearest Neighbor (kNN), and J48 Decision Tree classifier algorithms with Al-Baqarah verses translated to English and Indonesian as the dataset which was labeled by three most fundamental aspects of Islam: 'Iman' (faith), 'Ibadah' (worship), and 'Akhlaq' (virtues). Indonesian translation was processed by using the sastrawi package in Python to do the pre-processing and StringToWord Vector in WEKA with the TF-IDF method to implement the algorithms. The classification experiments are determined to measure accuracy, and f-measure, it tested with a percentage split 66% as the data training and the rest as the data testing. The decision from an experiment that was carried out by the classification results, SVM classifier algorithms have the overall best accuracy performance for the Indonesian translation of 81.443% and the Naïve Bayes classifier has the best accuracy for the English translation, which achieved 78.35%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

英汉古兰经翻译文本挖掘分类算法的比较分析

《古兰经》、《圣训》和伊斯兰旧书已成为伊斯兰信徒知识、智慧和法律的源泉。但在日常生活中，仍然有许多穆斯林即使每天都读《古兰经》，但他们不理解这句话的含义。通过智能系统计算来探索和表示知识，以《古兰经》中的知识为基础回答各种问题，这对科学和工程领域的学者，尤其是信息学来说是一个挑战。这项研究为文本挖掘《古兰经》创造了一个有利的计算环境，目的是帮助人们理解《古兰经中的每一节经文。分类实验使用支持向量机（SVM）、朴素贝叶斯（Naive Bayes）、k近邻（kNN）和J48决策树分类器算法，以翻译成英语和印尼语的Al-Baqarah诗句为数据集，由伊斯兰教的三个最基本方面标记：“Iman”（信仰）、“Ibadah”（崇拜）和“Akhlaq”（美德）。印尼语翻译使用Python中的sastrawi包进行预处理，使用WEKA中的StringToWord Vector和TF-IDF方法实现算法。分类实验被确定为测量准确性，f-measure，它以66%的百分比作为数据训练进行测试，其余作为数据测试。根据分类结果进行的实验决定，SVM分类器算法对印尼语翻译的总体准确率性能最好，为81.443%，而Naïve Bayes分类器对英语翻译的准确率最好，为78.35%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IJID International Journal on Informatics for Development

自引率

0.00%

发文量

审稿时长

8 weeks