Bio-Inspired Algorithm Based Undersampling Approach and Ensemble Learning for Twitter Spam Detection

IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Uncertainty Fuzziness and Knowledge-Based Systems Pub Date : 2024-02-20 DOI:10.1142/s0218488524500016
K. Kiruthika Devi, G. A. Sathish Kumar
{"title":"Bio-Inspired Algorithm Based Undersampling Approach and Ensemble Learning for Twitter Spam Detection","authors":"K. Kiruthika Devi, G. A. Sathish Kumar","doi":"10.1142/s0218488524500016","DOIUrl":null,"url":null,"abstract":"<p>Currently, social media networks such as Facebook and Twitter have evolved into valuable platforms for global communication. However, due to their extensive user bases, Twitter is often misused by illegitimate users engaging in illicit activities. While there are numerous research papers available that delve into combating illegitimate users on Twitter, a common shortcoming in most of these works is the failure to address the issue of class imbalance, which significantly impacts the effectiveness of spam detection. Few other research works that have addressed class imbalance have not yet applied bio-inspired algorithms to balance the dataset. Therefore, we introduce PSOB-U, a particle swarm optimization-based undersampling technique designed to balance the Twitter dataset. In PSOB-U, various classifiers and metrics are employed to select majority samples and rank them. Furthermore, an ensemble learning approach is implemented to combine the base classifiers in three stages. During the training phase of the base classifiers, undersampling techniques and a cost-sensitive random forest (CS-RF) are utilized to address the imbalanced data at both the data and algorithmic levels. In the first stage, imbalanced datasets are balanced using random undersampling, particle swarm optimization-based undersampling, and random oversampling. In the second stage, a classifier is constructed for each of the balanced datasets obtained through these sampling techniques. In the third stage, a majority voting method is introduced to aggregate the predicted outputs from the three classifiers. The evaluation results demonstrate that our proposed method significantly enhances the detection of illegitimate users in the imbalanced Twitter dataset. Additionally, we compare our proposed work with existing models, and the predicted results highlight the superiority of our spam detection model over state-of-the-art spam detection models that address the class imbalance problem. The combination of particle swarm optimization-based undersampling and the ensemble learning approach using majority voting results in more accurate spam detection.</p>","PeriodicalId":50283,"journal":{"name":"International Journal of Uncertainty Fuzziness and Knowledge-Based Systems","volume":"136 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Uncertainty Fuzziness and Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1142/s0218488524500016","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Currently, social media networks such as Facebook and Twitter have evolved into valuable platforms for global communication. However, due to their extensive user bases, Twitter is often misused by illegitimate users engaging in illicit activities. While there are numerous research papers available that delve into combating illegitimate users on Twitter, a common shortcoming in most of these works is the failure to address the issue of class imbalance, which significantly impacts the effectiveness of spam detection. Few other research works that have addressed class imbalance have not yet applied bio-inspired algorithms to balance the dataset. Therefore, we introduce PSOB-U, a particle swarm optimization-based undersampling technique designed to balance the Twitter dataset. In PSOB-U, various classifiers and metrics are employed to select majority samples and rank them. Furthermore, an ensemble learning approach is implemented to combine the base classifiers in three stages. During the training phase of the base classifiers, undersampling techniques and a cost-sensitive random forest (CS-RF) are utilized to address the imbalanced data at both the data and algorithmic levels. In the first stage, imbalanced datasets are balanced using random undersampling, particle swarm optimization-based undersampling, and random oversampling. In the second stage, a classifier is constructed for each of the balanced datasets obtained through these sampling techniques. In the third stage, a majority voting method is introduced to aggregate the predicted outputs from the three classifiers. The evaluation results demonstrate that our proposed method significantly enhances the detection of illegitimate users in the imbalanced Twitter dataset. Additionally, we compare our proposed work with existing models, and the predicted results highlight the superiority of our spam detection model over state-of-the-art spam detection models that address the class imbalance problem. The combination of particle swarm optimization-based undersampling and the ensemble learning approach using majority voting results in more accurate spam detection.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于生物启发算法的下采样方法和集合学习用于 Twitter 垃圾邮件检测
目前,Facebook 和 Twitter 等社交媒体网络已发展成为全球交流的重要平台。然而,由于用户基础广泛,Twitter 经常被从事非法活动的非法用户滥用。虽然有许多研究论文深入探讨了如何打击 Twitter 上的非法用户,但大多数研究都存在一个共同的缺陷,那就是没有解决类不平衡问题,而这个问题严重影响了垃圾邮件检测的效果。其他极少数解决了类不平衡问题的研究还没有应用生物启发算法来平衡数据集。因此,我们引入了 PSOB-U,这是一种基于粒子群优化的欠采样技术,旨在平衡 Twitter 数据集。在 PSOB-U 中,我们采用了各种分类器和指标来选择多数样本并对其进行排序。此外,PSOB-U 还采用了一种集合学习方法,分三个阶段组合基础分类器。在基础分类器的训练阶段,利用欠采样技术和成本敏感随机森林(CS-RF)来解决数据和算法层面的不平衡数据问题。在第一阶段,使用随机欠采样、基于粒子群优化的欠采样和随机过采样来平衡不平衡数据集。在第二阶段,为通过这些采样技术获得的每个平衡数据集构建分类器。在第三阶段,引入多数投票法来汇总三个分类器的预测输出。评估结果表明,我们提出的方法大大提高了在不平衡 Twitter 数据集中对非法用户的检测能力。此外,我们还将所提出的工作与现有模型进行了比较,预测结果凸显了我们的垃圾邮件检测模型优于解决类不平衡问题的最先进垃圾邮件检测模型。基于粒子群优化的欠采样与使用多数投票的集合学习方法相结合,可实现更准确的垃圾邮件检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
2.70
自引率
0.00%
发文量
48
审稿时长
13.5 months
期刊介绍: The International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems is a forum for research on various methodologies for the management of imprecise, vague, uncertain or incomplete information. The aim of the journal is to promote theoretical or methodological works dealing with all kinds of methods to represent and manipulate imperfectly described pieces of knowledge, excluding results on pure mathematics or simple applications of existing theoretical results. It is published bimonthly, with worldwide distribution to researchers, engineers, decision-makers, and educators.
期刊最新文献
A Structure-Enhanced Heterogeneous Graph Representation Learning with Attention-Supplemented Embedding Fusion Homogenous Ensembles of Neuro-Fuzzy Classifiers using Hyperparameter Tuning for Medical Data PSO Based Constraint Optimization of Intuitionistic Fuzzy Shortest Path Problem in an Undirected Network Model Predictive Control for Interval Type-2 Fuzzy Systems with Unknown Time-Varying Delay in States and Input Vector An OWA Based MCDM Framework for Analyzing Multidimensional Twitter Data: A Case Study on the Citizen-Government Engagement During COVID-19
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1