Fake News Detection Using Machine Learning Techniques

Achhiya Sultana, Mahmudul Islam, Mahady Hasan, F. Ahmed
{"title":"Fake News Detection Using Machine Learning Techniques","authors":"Achhiya Sultana, Mahmudul Islam, Mahady Hasan, F. Ahmed","doi":"10.1109/SERA57763.2023.10197712","DOIUrl":null,"url":null,"abstract":"A lot of information is spread by people in the social media to update their status and share crucial news with others. But the majority of these platforms don’t promptly validate the individuals or their posts and people aren’t able to identify the fake news manually. Therefore, there is a need for an automated system capable of detecting fake news. This research has proposed to build a model using four machine learning algorithms. The dataset employed in the experiment is a composite of two datasets containing almost equal amounts of true and fake news articles on politics. The preprocessing stages begin with cleaning the data by removing punctuation, tokenization, special characters, white spaces, redundant word elimination, numerals, and English letters followed by stemming and stop with data discretization. Then, we analyzed the collected data and 80% of the data has been used to train each model initially. After that, the four manifested classification algorithms are applied. For identifying fake news from news articles, meth-ods like Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting Classifier were used. The trained classifiers’ accuracy has been evaluated using the remaining 20% of the data. The results show that the decision tree model produces the best accuracy of 99.60% and gradient boosting of 99.55%. Besides, the random forest shows 99.10% along with the logistic regression 98.99%. Moreover, we have explored the best model to achieve the highest precision, recall, F1-score based on the confusion matrix’s outcome.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A lot of information is spread by people in the social media to update their status and share crucial news with others. But the majority of these platforms don’t promptly validate the individuals or their posts and people aren’t able to identify the fake news manually. Therefore, there is a need for an automated system capable of detecting fake news. This research has proposed to build a model using four machine learning algorithms. The dataset employed in the experiment is a composite of two datasets containing almost equal amounts of true and fake news articles on politics. The preprocessing stages begin with cleaning the data by removing punctuation, tokenization, special characters, white spaces, redundant word elimination, numerals, and English letters followed by stemming and stop with data discretization. Then, we analyzed the collected data and 80% of the data has been used to train each model initially. After that, the four manifested classification algorithms are applied. For identifying fake news from news articles, meth-ods like Logistic Regression, Decision Tree, Random Forest, and Gradient Boosting Classifier were used. The trained classifiers’ accuracy has been evaluated using the remaining 20% of the data. The results show that the decision tree model produces the best accuracy of 99.60% and gradient boosting of 99.55%. Besides, the random forest shows 99.10% along with the logistic regression 98.99%. Moreover, we have explored the best model to achieve the highest precision, recall, F1-score based on the confusion matrix’s outcome.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习技术检测假新闻
人们在社交媒体上传播大量信息,以更新自己的状态,并与他人分享重要新闻。但这些平台中的大多数都没有及时验证个人或他们的帖子,人们也无法手动识别假新闻。因此,需要一种能够检测假新闻的自动化系统。本研究提出使用四种机器学习算法建立一个模型。实验中使用的数据集是两个数据集的组合,其中包含几乎相同数量的真假政治新闻文章。预处理阶段首先通过删除标点符号、标记化、特殊字符、空白、冗余单词消除、数字和英文字母来清理数据,然后进行词干提取,最后以数据离散化结束。然后,我们对收集到的数据进行分析,80%的数据被用于初始训练每个模型。然后,应用了四种分类算法。为了从新闻文章中识别假新闻,使用了逻辑回归、决策树、随机森林和梯度增强分类器等方法。使用剩下的20%的数据对训练好的分类器的准确性进行了评估。结果表明,决策树模型的准确率为99.60%,梯度提升率为99.55%。随机森林为99.10%,逻辑回归为98.99%。此外,我们还探索了基于混淆矩阵结果的最佳模型,以实现最高的精度,召回率,f1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhancing Students’ Job Seeking Process Through A Digital Badging System Classification of Multilingual Medical Documents using Deep Learning Data-Driven Smart Manufacturing Technologies for Prop Shop Systems Identifying Code Tampering Using A Bytecode Comparison Analysis Tool Evaluating the Performance of Containerized Webservers against web servers on Virtual Machines using Bombardment and Siege
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1