Haewon Byeon, Sameer Jha, Ismail Keshta, Mohammed Wasim Bhatt, P. Singh, Latika Jindal, T. R. Vijaya Lakshmi
{"title":"Spam Text Detection Over Social Media Usage: A Supervised Sampling Approach for the Social Web of Things","authors":"Haewon Byeon, Sameer Jha, Ismail Keshta, Mohammed Wasim Bhatt, P. Singh, Latika Jindal, T. R. Vijaya Lakshmi","doi":"10.1109/MSMC.2023.3343950","DOIUrl":null,"url":null,"abstract":"A downsampling strategy based on negative selection density clustering (NSDC-DS) is proposed to improve classifier performance while employing random downsampling for unbalanced communication text. The discovery of self-anomalies via negative selection enhances traditional clustering. The detector and self-set are the sample center point and the sample to be clustered, respectively; anomalous matching is performed on the two; and the NSDC technique analyzes sample similarity. To improve on the traditional downsampling method, we use the Naïve Bayes Support Vector Machine (NBSVM) classifier to identify garbage in sampled communication samples, use principal component analysis (PCA) to evaluate sample information content, propose an improved PCA-signed directed graph (SGD) algorithm to optimize model parameters, and complete semisupervised communication spam text recognition over the Social Web of Things. Several datasets, including unbalanced communication text, were used to compare the improved approach against NSDC, NSDC-DS, PCA-SGD, and standard models. According to the trials, the improved model has a quicker and more consistent convergence speed.","PeriodicalId":516814,"journal":{"name":"IEEE Systems, Man, and Cybernetics Magazine","volume":"8 10","pages":"32-39"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Systems, Man, and Cybernetics Magazine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSMC.2023.3343950","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A downsampling strategy based on negative selection density clustering (NSDC-DS) is proposed to improve classifier performance while employing random downsampling for unbalanced communication text. The discovery of self-anomalies via negative selection enhances traditional clustering. The detector and self-set are the sample center point and the sample to be clustered, respectively; anomalous matching is performed on the two; and the NSDC technique analyzes sample similarity. To improve on the traditional downsampling method, we use the Naïve Bayes Support Vector Machine (NBSVM) classifier to identify garbage in sampled communication samples, use principal component analysis (PCA) to evaluate sample information content, propose an improved PCA-signed directed graph (SGD) algorithm to optimize model parameters, and complete semisupervised communication spam text recognition over the Social Web of Things. Several datasets, including unbalanced communication text, were used to compare the improved approach against NSDC, NSDC-DS, PCA-SGD, and standard models. According to the trials, the improved model has a quicker and more consistent convergence speed.