Twitter spam drift detection by semi supervised learning approach using YATSI algorithm

IF 1.4 Q2 ENGINEERING, MULTIDISCIPLINARY International Journal of System Assurance Engineering and Management Pub Date : 2024-08-12 DOI:10.1007/s13198-024-02445-1

P. Sivakumar, M. Balasubramani, R. Sowndharya, B. S. Deepa Priya, W. Deva Priya, Maganti Syamala

{"title":"Twitter spam drift detection by semi supervised learning approach using YATSI algorithm","authors":"P. Sivakumar, M. Balasubramani, R. Sowndharya, B. S. Deepa Priya, W. Deva Priya, Maganti Syamala","doi":"10.1007/s13198-024-02445-1","DOIUrl":null,"url":null,"abstract":"<p>Twitter has improved in such a way people acquire knowledge or information by making them share their thoughts and opinions on everyday tweets. However, spammers have discovered Twitter to be desirable for spreading spam as a result of its enormous popularity. Twitter spam, in contrast to other types of spam, has recently become a big concern. The enormous number of users and volume of content or information published on Twitter contribute considerably to the rise of spam. To protect users, Twitter and the research team have developed several spam detection systems that employ various machine-learning techniques. According to a new study, existing machine learning-based detection algorithms are unable to detect spam correctly since the features of spam tweets vary over time. The issue is referred to as “Twitter Spam Drift.” In this paper, a semi-supervised learning approach (SSLA) using the YATSI algorithm has been suggested. YATSI is categorized into two steps. An initial prediction model is the first phase. The genuine predictions for unlabeled cases are identified in the second phase by using ML algorithms. To deal with the drift, the study utilizes a live Twitter stream of data acquired using Twitter API. This proposed method uses pre-processed labelled data to learn the structure of unlabeled data that is live-downloaded to distinguish between genuine and fake users. Experiments were conducted on live twitter data using KNN, SVM and NB machine learning classifiers. Among those classifiers SVM is showing the better results, in-terms of accuracy.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":"18 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02445-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Twitter has improved in such a way people acquire knowledge or information by making them share their thoughts and opinions on everyday tweets. However, spammers have discovered Twitter to be desirable for spreading spam as a result of its enormous popularity. Twitter spam, in contrast to other types of spam, has recently become a big concern. The enormous number of users and volume of content or information published on Twitter contribute considerably to the rise of spam. To protect users, Twitter and the research team have developed several spam detection systems that employ various machine-learning techniques. According to a new study, existing machine learning-based detection algorithms are unable to detect spam correctly since the features of spam tweets vary over time. The issue is referred to as “Twitter Spam Drift.” In this paper, a semi-supervised learning approach (SSLA) using the YATSI algorithm has been suggested. YATSI is categorized into two steps. An initial prediction model is the first phase. The genuine predictions for unlabeled cases are identified in the second phase by using ML algorithms. To deal with the drift, the study utilizes a live Twitter stream of data acquired using Twitter API. This proposed method uses pre-processed labelled data to learn the structure of unlabeled data that is live-downloaded to distinguish between genuine and fake users. Experiments were conducted on live twitter data using KNN, SVM and NB machine learning classifiers. Among those classifiers SVM is showing the better results, in-terms of accuracy.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用 YATSI 算法的半监督学习方法检测 Twitter 垃圾邮件漂移

Twitter 的改进使人们可以通过每天在推特上分享自己的想法和观点来获取知识或信息。然而，垃圾邮件发送者发现，Twitter 的巨大人气使其成为传播垃圾邮件的理想场所。与其他类型的垃圾邮件相比，Twitter 垃圾邮件最近引起了人们的极大关注。Twitter 上巨大的用户数量和发布的内容或信息量在很大程度上导致了垃圾邮件的增加。为了保护用户，Twitter 和研究团队开发了多个采用各种机器学习技术的垃圾邮件检测系统。根据一项新的研究，现有的基于机器学习的检测算法无法正确检测垃圾邮件，因为垃圾推文的特征随时间而变化。这个问题被称为 "Twitter 垃圾漂移"。本文提出了一种使用 YATSI 算法的半监督学习方法 (SSLA)。YATSI 算法分为两个步骤。第一阶段是建立初始预测模型。在第二阶段，使用 ML 算法识别未标记案例的真实预测结果。为了解决漂移问题，该研究利用 Twitter API 获取的 Twitter 实时数据流。该方法使用预处理过的标记数据来学习实时下载的未标记数据的结构，从而区分真假用户。使用 KNN、SVM 和 NB 机器学习分类器对实时 Twitter 数据进行了实验。在这些分类器中，SVM 的准确率较高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of System Assurance Engineering and Management ENGINEERING, MULTIDISCIPLINARY-

CiteScore

4.30

自引率

10.00%

发文量

252

期刊介绍： This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems. Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.