基于集成学习的Twitter垃圾邮件分类模型实验评价

2022 5th Information Technology for Education and Development (ITED) Pub Date : 2022-11-01 DOI:10.1109/ITED56637.2022.10051587

R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku

{"title":"基于集成学习的Twitter垃圾邮件分类模型实验评价","authors":"R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku","doi":"10.1109/ITED56637.2022.10051587","DOIUrl":null,"url":null,"abstract":"People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.","PeriodicalId":246041,"journal":{"name":"2022 5th Information Technology for Education and Development (ITED)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification\",\"authors\":\"R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku\",\"doi\":\"10.1109/ITED56637.2022.10051587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.\",\"PeriodicalId\":246041,\"journal\":{\"name\":\"2022 5th Information Technology for Education and Development (ITED)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th Information Technology for Education and Development (ITED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITED56637.2022.10051587\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th Information Technology for Education and Development (ITED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITED56637.2022.10051587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

恶意分子不断通过各种手段在互联网上发动攻击。这些攻击者正将攻击目标转向twitter、facebook、Instagram等社交网站。其中一种攻击方法是在社交媒体平台上使用垃圾邮件。社交网络垃圾邮件是指在facebook、twitter、instagram等社交网站上使用不需要的内容。由于攻击者已经将注意力转移到使用社交媒体平台来执行他们的邪恶活动，因此有必要不断设计安全措施来描述基于社交媒体的垃圾邮件攻击。本研究涉及两种集成学习模型在twitter垃圾邮件分类中的实验评估。本研究使用的数据集是twitter垃圾邮件研究的公开数据集。数据集文件分为四个不同的组，包含不同的twitter垃圾邮件证据。在每个实验中，使用了整个数据集中的每个文件。对数据集进行探索性分析，一次一个。然后，采用标签编码技术对分类特征进行处理。然后，选择随机森林(Random Forest)和额外树(Extra Trees)两种基于树的集成学习算法构建twitter垃圾邮件检测模型。每个数据集文件都用于基于机器学习的twitter垃圾邮件检测模型的训练和测试。对所建模型的性能进行了评价和比较。研究表明，twitter垃圾邮件检测模型的性能是有希望的。总的来说，与Extra trees模型相比，基于rf的模型在准确性、精密度、召回率和f1-score方面都有更好的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification

People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 5th Information Technology for Education and Development (ITED)

自引率

0.00%

发文量