Twitter spam drift detection by semi supervised learning approach using YATSI algorithm

IF 1.6 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-08-12 DOI:10.1007/s13198-024-02445-1
P. Sivakumar, M. Balasubramani, R. Sowndharya, B. S. Deepa Priya, W. Deva Priya, Maganti Syamala
{"title":"Twitter spam drift detection by semi supervised learning approach using YATSI algorithm","authors":"P. Sivakumar, M. Balasubramani, R. Sowndharya, B. S. Deepa Priya, W. Deva Priya, Maganti Syamala","doi":"10.1007/s13198-024-02445-1","DOIUrl":null,"url":null,"abstract":"<p>Twitter has improved in such a way people acquire knowledge or information by making them share their thoughts and opinions on everyday tweets. However, spammers have discovered Twitter to be desirable for spreading spam as a result of its enormous popularity. Twitter spam, in contrast to other types of spam, has recently become a big concern. The enormous number of users and volume of content or information published on Twitter contribute considerably to the rise of spam. To protect users, Twitter and the research team have developed several spam detection systems that employ various machine-learning techniques. According to a new study, existing machine learning-based detection algorithms are unable to detect spam correctly since the features of spam tweets vary over time. The issue is referred to as “Twitter Spam Drift.” In this paper, a semi-supervised learning approach (SSLA) using the YATSI algorithm has been suggested. YATSI is categorized into two steps. An initial prediction model is the first phase. The genuine predictions for unlabeled cases are identified in the second phase by using ML algorithms. To deal with the drift, the study utilizes a live Twitter stream of data acquired using Twitter API. This proposed method uses pre-processed labelled data to learn the structure of unlabeled data that is live-downloaded to distinguish between genuine and fake users. Experiments were conducted on live twitter data using KNN, SVM and NB machine learning classifiers. Among those classifiers SVM is showing the better results, in-terms of accuracy.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02445-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Twitter has improved in such a way people acquire knowledge or information by making them share their thoughts and opinions on everyday tweets. However, spammers have discovered Twitter to be desirable for spreading spam as a result of its enormous popularity. Twitter spam, in contrast to other types of spam, has recently become a big concern. The enormous number of users and volume of content or information published on Twitter contribute considerably to the rise of spam. To protect users, Twitter and the research team have developed several spam detection systems that employ various machine-learning techniques. According to a new study, existing machine learning-based detection algorithms are unable to detect spam correctly since the features of spam tweets vary over time. The issue is referred to as “Twitter Spam Drift.” In this paper, a semi-supervised learning approach (SSLA) using the YATSI algorithm has been suggested. YATSI is categorized into two steps. An initial prediction model is the first phase. The genuine predictions for unlabeled cases are identified in the second phase by using ML algorithms. To deal with the drift, the study utilizes a live Twitter stream of data acquired using Twitter API. This proposed method uses pre-processed labelled data to learn the structure of unlabeled data that is live-downloaded to distinguish between genuine and fake users. Experiments were conducted on live twitter data using KNN, SVM and NB machine learning classifiers. Among those classifiers SVM is showing the better results, in-terms of accuracy.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用 YATSI 算法的半监督学习方法检测 Twitter 垃圾邮件漂移
Twitter 的改进使人们可以通过每天在推特上分享自己的想法和观点来获取知识或信息。然而,垃圾邮件发送者发现,Twitter 的巨大人气使其成为传播垃圾邮件的理想场所。与其他类型的垃圾邮件相比,Twitter 垃圾邮件最近引起了人们的极大关注。Twitter 上巨大的用户数量和发布的内容或信息量在很大程度上导致了垃圾邮件的增加。为了保护用户,Twitter 和研究团队开发了多个采用各种机器学习技术的垃圾邮件检测系统。根据一项新的研究,现有的基于机器学习的检测算法无法正确检测垃圾邮件,因为垃圾推文的特征随时间而变化。这个问题被称为 "Twitter 垃圾漂移"。本文提出了一种使用 YATSI 算法的半监督学习方法 (SSLA)。YATSI 算法分为两个步骤。第一阶段是建立初始预测模型。在第二阶段,使用 ML 算法识别未标记案例的真实预测结果。为了解决漂移问题,该研究利用 Twitter API 获取的 Twitter 实时数据流。该方法使用预处理过的标记数据来学习实时下载的未标记数据的结构,从而区分真假用户。使用 KNN、SVM 和 NB 机器学习分类器对实时 Twitter 数据进行了实验。在这些分类器中,SVM 的准确率较高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.30
自引率
10.00%
发文量
252
期刊介绍: This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.
期刊最新文献
Vision-based gait analysis to detect Parkinson’s disease using hybrid Harris hawks and Arithmetic optimization algorithm with Random Forest classifier Zero crossing point detection in a distorted sinusoidal signal using random forest classifier FL-XGBTC: federated learning inspired with XG-boost tuned classifier for YouTube spam content detection A generalized product adoption model under random marketing conditions Assessing e-learning platforms in higher education with reference to student satisfaction: a PLS-SEM approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1