管理和使用Facebook在线信息的数据挖掘

Pub Date : 2023-01-01 DOI:10.12720/jait.14.4.769-776
Nidal Al Said
{"title":"管理和使用Facebook在线信息的数据挖掘","authors":"Nidal Al Said","doi":"10.12720/jait.14.4.769-776","DOIUrl":null,"url":null,"abstract":"—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Mining for Managing and Using Online Information on Facebook\",\"authors\":\"Nidal Al Said\",\"doi\":\"10.12720/jait.14.4.769-776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12720/jait.14.4.769-776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.4.769-776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

这项工作研究的问题是研究用于智能分析阿拉伯语数据的数据挖掘算法。这项研究包括几个阶段,包括数据收集和预处理;数据挖掘算法(多项Naïve贝叶斯分类器,Naïve贝叶斯分类器,支持向量机和改进K-Means);研究结果处理及软件实现。总共下载了16,732个完全用阿拉伯语写的Facebook帖子。其中近三分之二(即11155项)用于训练算法,其余(5577项)用于研究。训练数据根据主题(政治、娱乐、医学、科学、宗教)分为5组,每组使用5个关键词进行测试。与政治相关的帖子最多(4736条)。对于最大数量的测试数据,最准确的算法被证明是多项式Naïve贝叶斯分类器,而对于支持向量机,则记录该特征的最小值。多项式Naïve贝叶斯分类器算法在最大数据量下的有效性最为显著,而支持向量机在最小数据量下的有效性最为显著。对于多项式Naïve贝叶斯分类器,参数的拟合分数在5577个数据点和K-means的1394个数据点处最大。为了改进和完善数据挖掘的结果,必须扩展样本,添加更多的数据类和关键字。其他机器学习模型,如深度学习算法,也可以使用。调查的意义非常重要,因为它扩展了我们使用机器学习算法在社交媒体平台上挖掘阿拉伯语文本的知识。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
Data Mining for Managing and Using Online Information on Facebook
—The problem under the study of this work is investigating data mining algorithms for intelligent analysis of data written in Arabic. The study comprised instead involves several stages, including Data Collection and Pre-Processing; Data Mining Algorithms (Multinomial Naïve Bayes Classifier, Naïve Bayes Classifier, Support Vector Machine and Modified K-Means); Study Results Processing and Software Implementation. A total of 16,732 Facebook posts written exclusively in Arabic were downloaded. Almost two-thirds of them (namely 11,155 items) were used to train algorithms, while the rest (5577 items) were subject to research. The training data were categorized into five groups based on subjects (politics, entertainment, medicine, science, and religion) with five keywords used for testing in each group. Most posts (4736 items) were related to politics. The most accurate algorithm proved to be the multinomial Naïve Bayesian classifier for the maximum number of test data, while the minimum values of this feature were recorded for the Support vector machine. The effectiveness of the multinomial Naïve Bayesian classifier algorithm was most remarkable for the maximum amount of data, while the Support Vector Machine was most effective for the minimum amount. The argument’s fit score is maximum at 5577 data points for the multinomial Naïve Bayesian classifier and 1394 data points for K-means. To improve and refine the results of data mining, the sample must be expanded, adding more data classes and keywords. Other machine learning models, such as deep learning algorithms, could also be used. The significance of investigation is very important because it expands our knowledge about the use of Machine Learning Algorithms to mine Arabic texts on social media platforms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1