P. Sivakumar, M. Balasubramani, R. Sowndharya, B. S. Deepa Priya, W. Deva Priya, Maganti Syamala
{"title":"Twitter spam drift detection by semi supervised learning approach using YATSI algorithm","authors":"P. Sivakumar, M. Balasubramani, R. Sowndharya, B. S. Deepa Priya, W. Deva Priya, Maganti Syamala","doi":"10.1007/s13198-024-02445-1","DOIUrl":null,"url":null,"abstract":"<p>Twitter has improved in such a way people acquire knowledge or information by making them share their thoughts and opinions on everyday tweets. However, spammers have discovered Twitter to be desirable for spreading spam as a result of its enormous popularity. Twitter spam, in contrast to other types of spam, has recently become a big concern. The enormous number of users and volume of content or information published on Twitter contribute considerably to the rise of spam. To protect users, Twitter and the research team have developed several spam detection systems that employ various machine-learning techniques. According to a new study, existing machine learning-based detection algorithms are unable to detect spam correctly since the features of spam tweets vary over time. The issue is referred to as “Twitter Spam Drift.” In this paper, a semi-supervised learning approach (SSLA) using the YATSI algorithm has been suggested. YATSI is categorized into two steps. An initial prediction model is the first phase. The genuine predictions for unlabeled cases are identified in the second phase by using ML algorithms. To deal with the drift, the study utilizes a live Twitter stream of data acquired using Twitter API. This proposed method uses pre-processed labelled data to learn the structure of unlabeled data that is live-downloaded to distinguish between genuine and fake users. Experiments were conducted on live twitter data using KNN, SVM and NB machine learning classifiers. Among those classifiers SVM is showing the better results, in-terms of accuracy.</p>","PeriodicalId":14463,"journal":{"name":"International Journal of System Assurance Engineering and Management","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of System Assurance Engineering and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13198-024-02445-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Twitter has improved in such a way people acquire knowledge or information by making them share their thoughts and opinions on everyday tweets. However, spammers have discovered Twitter to be desirable for spreading spam as a result of its enormous popularity. Twitter spam, in contrast to other types of spam, has recently become a big concern. The enormous number of users and volume of content or information published on Twitter contribute considerably to the rise of spam. To protect users, Twitter and the research team have developed several spam detection systems that employ various machine-learning techniques. According to a new study, existing machine learning-based detection algorithms are unable to detect spam correctly since the features of spam tweets vary over time. The issue is referred to as “Twitter Spam Drift.” In this paper, a semi-supervised learning approach (SSLA) using the YATSI algorithm has been suggested. YATSI is categorized into two steps. An initial prediction model is the first phase. The genuine predictions for unlabeled cases are identified in the second phase by using ML algorithms. To deal with the drift, the study utilizes a live Twitter stream of data acquired using Twitter API. This proposed method uses pre-processed labelled data to learn the structure of unlabeled data that is live-downloaded to distinguish between genuine and fake users. Experiments were conducted on live twitter data using KNN, SVM and NB machine learning classifiers. Among those classifiers SVM is showing the better results, in-terms of accuracy.
期刊介绍:
This Journal is established with a view to cater to increased awareness for high quality research in the seamless integration of heterogeneous technologies to formulate bankable solutions to the emergent complex engineering problems.
Assurance engineering could be thought of as relating to the provision of higher confidence in the reliable and secure implementation of a system’s critical characteristic features through the espousal of a holistic approach by using a wide variety of cross disciplinary tools and techniques. Successful realization of sustainable and dependable products, systems and services involves an extensive adoption of Reliability, Quality, Safety and Risk related procedures for achieving high assurancelevels of performance; also pivotal are the management issues related to risk and uncertainty that govern the practical constraints encountered in their deployment. It is our intention to provide a platform for the modeling and analysis of large engineering systems, among the other aforementioned allied goals of systems assurance engineering, leading to the enforcement of performance enhancement measures. Achieving a fine balance between theory and practice is the primary focus. The Journal only publishes high quality papers that have passed the rigorous peer review procedure of an archival scientific Journal. The aim is an increasing number of submissions, wide circulation and a high impact factor.