Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods

F. W. Wibowo, Akhmad Dahlan, Wihayati
{"title":"Detection of Fake News and Hoaxes on Information from Web Scraping using Classifier Methods","authors":"F. W. Wibowo, Akhmad Dahlan, Wihayati","doi":"10.1109/ISRITI54043.2021.9702824","DOIUrl":null,"url":null,"abstract":"Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.","PeriodicalId":156265,"journal":{"name":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI54043.2021.9702824","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Current technological developments can make humans get information from the hand through gadget media. However, bad and good impacts are indeed a problem that arises in implementing this technology media. Fake news and hoaxes have developed along with social media applications obtained from these technological media. This paper aims to detect fake news and hoaxes using classification modeling. The classification models implemented in this paper are support vector machine (SVM), random forest, nearest centroid, stochastic gradient descent (SGD) method, decision tree (Tree), bagging, AdaBoost, gradient boosting, multi-layer perceptron artificial neural network (MLP ANN), and K-nearest neighbors (K-NN). The data obtained through web scraping amounted to 1116 data from Indonesian language news, with the distribution of training data and test data for modeling of 70% and 30%. The testing data are 335 data consisting of 205 fake news and hoax data and 130 real news data. Web data content processing using the principle of natural language processing (NLP) methods. The random forest model is the best model for classifying fake news and hoaxes with an accuracy value of 89%. The following models with the next high scores are SVM, Gradient Boosting, AdaBoost, SGD, and Decision Tree, respectively, with the highest scores above 80%. In comparison, other methods have accuracy scores below 80%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用分类器方法检测网络抓取信息中的假新闻和骗局
目前的科技发展可以让人类通过小工具媒介从手中获取信息。然而,好坏影响确实是在实施这一技术媒体过程中出现的一个问题。随着从这些技术媒体获得的社交媒体应用程序的发展,假新闻和骗局也随之发展。本文旨在使用分类建模来检测假新闻和骗局。本文实现的分类模型有支持向量机(SVM)、随机森林、最近质心、随机梯度下降(SGD)方法、决策树(tree)、bagging、AdaBoost、梯度增强、多层感知器人工神经网络(MLP ANN)和k -近邻(K-NN)。通过网络抓取获得的数据总计1116个印尼语新闻数据,其中训练数据和建模测试数据的分布分别占70%和30%。测试数据为335个数据,其中假新闻和骗局数据205个,真实新闻数据130个。Web数据内容处理采用自然语言处理(NLP)的原理方法。随机森林模型是对假新闻和骗局进行分类的最佳模型,准确率为89%。接下来得分较高的模型分别是SVM、Gradient Boosting、AdaBoost、SGD和Decision Tree,最高得分都在80%以上。相比之下,其他方法的准确率得分在80%以下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improved HEVC Video Encoding Quality With Multi Scalability Techniques Indonesian Clickbait Detection Using Improved Backpropagation Neural Network Sentiment Analysis for Twitter Chatter During the Early Outbreak Period of COVID-19 Online Retail Pattern Quality Improvement: From Frequent Sequential Pattern to High-Utility Sequential Pattern East Nusa Tenggara Weaving Image Retrieval Using Convolutional Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1