Data Mining for Managing and Using Online Information on Facebook

IF 1.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Advances in Information Technology Pub Date : 2023-01-01 DOI:10.12720/jait.14.4.769-776

Nidal Al Said

{"title":"Data Mining for Managing and Using Online Information on Facebook","authors":"Nidal Al Said","doi":"10.12720/jait.14.4.769-776","DOIUrl":null,"url":null,"abstract":"—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.","PeriodicalId":36452,"journal":{"name":"Journal of Advances in Information Technology","volume":"1 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.4.769-776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

管理和使用Facebook在线信息的数据挖掘

这项工作研究的问题是研究用于智能分析阿拉伯语数据的数据挖掘算法。这项研究包括几个阶段，包括数据收集和预处理;数据挖掘算法(多项Naïve贝叶斯分类器，Naïve贝叶斯分类器，支持向量机和改进K-Means);研究结果处理及软件实现。总共下载了16,732个完全用阿拉伯语写的Facebook帖子。其中近三分之二(即11155项)用于训练算法，其余(5577项)用于研究。训练数据根据主题(政治、娱乐、医学、科学、宗教)分为5组，每组使用5个关键词进行测试。与政治相关的帖子最多(4736条)。对于最大数量的测试数据，最准确的算法被证明是多项式Naïve贝叶斯分类器，而对于支持向量机，则记录该特征的最小值。多项式Naïve贝叶斯分类器算法在最大数据量下的有效性最为显著，而支持向量机在最小数据量下的有效性最为显著。对于多项式Naïve贝叶斯分类器，参数的拟合分数在5577个数据点和K-means的1394个数据点处最大。为了改进和完善数据挖掘的结果，必须扩展样本，添加更多的数据类和关键字。其他机器学习模型，如深度学习算法，也可以使用。调查的意义非常重要，因为它扩展了我们使用机器学习算法在社交媒体平台上挖掘阿拉伯语文本的知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊