M. Sethi, Naman Tyagi, Parmeet Singh Kalsi, Parupalli Atchuta Rao
{"title":"Deep Learning-based Binary Classification for Spam Detection in SMS Data: Addressing Imbalanced Data with Sampling Techniques","authors":"M. Sethi, Naman Tyagi, Parmeet Singh Kalsi, Parupalli Atchuta Rao","doi":"10.1109/ACCAI58221.2023.10199860","DOIUrl":null,"url":null,"abstract":"This research paper presents a deep learning-based approach for detecting spam in SMS (text) data. The study uses various models namely Dense, LSTM, Bi-LSTM, and GRU to conduct binary classification and predict spam text messages. To address the imbalanced data problem, the study employs undersampling, downsampling, and SMOTE sampling techniques on a public dataset of SMS messages from UCL datasets. The paper presents a study on detecting spam messages in SMS using a dense model. The researchers visualize the commonly used words in spam and non-spam messages and analyze their impact on the model's performance. The findings from this study demonstrate that the proposed dense model exhibits high accuracy in detecting spam messages on the test dataset. This suggests that the model can be useful in identifying spam messages in SMS.","PeriodicalId":382104,"journal":{"name":"2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACCAI58221.2023.10199860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This research paper presents a deep learning-based approach for detecting spam in SMS (text) data. The study uses various models namely Dense, LSTM, Bi-LSTM, and GRU to conduct binary classification and predict spam text messages. To address the imbalanced data problem, the study employs undersampling, downsampling, and SMOTE sampling techniques on a public dataset of SMS messages from UCL datasets. The paper presents a study on detecting spam messages in SMS using a dense model. The researchers visualize the commonly used words in spam and non-spam messages and analyze their impact on the model's performance. The findings from this study demonstrate that the proposed dense model exhibits high accuracy in detecting spam messages on the test dataset. This suggests that the model can be useful in identifying spam messages in SMS.