Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification

R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku
{"title":"Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification","authors":"R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku","doi":"10.1109/ITED56637.2022.10051587","DOIUrl":null,"url":null,"abstract":"People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.","PeriodicalId":246041,"journal":{"name":"2022 5th Information Technology for Education and Development (ITED)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th Information Technology for Education and Development (ITED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITED56637.2022.10051587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于集成学习的Twitter垃圾邮件分类模型实验评价
恶意分子不断通过各种手段在互联网上发动攻击。这些攻击者正将攻击目标转向twitter、facebook、Instagram等社交网站。其中一种攻击方法是在社交媒体平台上使用垃圾邮件。社交网络垃圾邮件是指在facebook、twitter、instagram等社交网站上使用不需要的内容。由于攻击者已经将注意力转移到使用社交媒体平台来执行他们的邪恶活动,因此有必要不断设计安全措施来描述基于社交媒体的垃圾邮件攻击。本研究涉及两种集成学习模型在twitter垃圾邮件分类中的实验评估。本研究使用的数据集是twitter垃圾邮件研究的公开数据集。数据集文件分为四个不同的组,包含不同的twitter垃圾邮件证据。在每个实验中,使用了整个数据集中的每个文件。对数据集进行探索性分析,一次一个。然后,采用标签编码技术对分类特征进行处理。然后,选择随机森林(Random Forest)和额外树(Extra Trees)两种基于树的集成学习算法构建twitter垃圾邮件检测模型。每个数据集文件都用于基于机器学习的twitter垃圾邮件检测模型的训练和测试。对所建模型的性能进行了评价和比较。研究表明,twitter垃圾邮件检测模型的性能是有希望的。总的来说,与Extra trees模型相比,基于rf的模型在准确性、精密度、召回率和f1-score方面都有更好的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Drug Recommender Systems: A Review of State-of-the-Art Algorithms An Improved Password-authentication Model for Access Control in Connected Systems Inset Fed Circular Microstrip Patch Antenna at 2.4 GHz for IWSN Applications Development of Alcohol Detection with Engine Locking and Short Messaging Service Tracking System A Machine Learning Technique for Detection of Diabetes Mellitus
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1