{"title":"Data Mining for Managing and Using Online Information on Facebook","authors":"Nidal Al Said","doi":"10.12720/jait.14.4.769-776","DOIUrl":null,"url":null,"abstract":"—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.4.769-776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.