R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku
{"title":"基于集成学习的Twitter垃圾邮件分类模型实验评价","authors":"R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku","doi":"10.1109/ITED56637.2022.10051587","DOIUrl":null,"url":null,"abstract":"People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.","PeriodicalId":246041,"journal":{"name":"2022 5th Information Technology for Education and Development (ITED)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification\",\"authors\":\"R. Jimoh, A. Oyelakin, I. S. Olatinwo, K. Y. Obiwusi, S. Muhammad-Thani, T. S. Ogundele, A. Giwa-Raheem, O. F. Ayepeku\",\"doi\":\"10.1109/ITED56637.2022.10051587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.\",\"PeriodicalId\":246041,\"journal\":{\"name\":\"2022 5th Information Technology for Education and Development (ITED)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 5th Information Technology for Education and Development (ITED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITED56637.2022.10051587\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th Information Technology for Education and Development (ITED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITED56637.2022.10051587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Experimental Evaluation of Ensemble Learning-Based Models for Twitter Spam Classification
People with malicious intent keep launching attacks in the internet through various means. These attackers are shifting their attacks to social sites such as twitter, facebook Instagram and the likes. One of attack methods is the use of spam in the social media platforms. Social network spam involves using unwanted content that appear on social networking sites such as facebook, twitter, instagram and related ones. Since attackers have shifted attention to using social media platforms for carrying out their nefarious activities there is a need to keep devising security measures to characterise social media based spam attacks. Thisstudy involves experimental evaluation of two ensemble learning models for twitter spam classification. The dataset employed in this study is a publicly available dataset on twitter spam studies. The dataset files are in four different groups, contain different twitter spam evidence. In each of the experimentation, each file in the whole dataset was used. Exploratory analysis of the datasets was carried out, one at a time. Thereafter, label encoding technique was used to handle the categorical feature. Then, two tree-based ensemble learning algorithms namely: Random Forest and Extra Trees algorithms were chosen to build the twitter spam detection models. Each of the set of dataset files was used for the training and testing of machine learning-based twitter spam detection models. The performances of the models built were evaluated and compared. The study revealed that the performances of the twitter spam detection models were promising. In all, the RF-based model recorded better performances in accuracy, precision, recall and f1-score compared to the results in the Extra Trees-based model.