Email spam detection by deep learning models using novel feature selection technique and BERT

IF 5 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Egyptian Informatics Journal Pub Date : 2024-04-30 DOI:10.1016/j.eij.2024.100473

Ghazala Nasreen, Muhammad Murad Khan, Muhammad Younus, Bushra Zafar, Muhammad Kashif Hanif

{"title":"Email spam detection by deep learning models using novel feature selection technique and BERT","authors":"Ghazala Nasreen, Muhammad Murad Khan, Muhammad Younus, Bushra Zafar, Muhammad Kashif Hanif","doi":"10.1016/j.eij.2024.100473","DOIUrl":null,"url":null,"abstract":"<div><p>Due to the influx of advancements in technology and the increased simplicity of communication through emails, there has been a severe threat to the global economy and security due to upsurge in volume of unsolicited During the training of models, high-dimensional and redundant datasets may reduce the classification results of the model due to high memory costs and high computation. An important data processing technique is feature selection which helps in selecting relevant features and subsets of information from the dataset. Therefore, choosing efficient feature selection techniques is very important for the best performance of classification of a model. Moreover, most of the research has been performed using traditional machine learning techniques, which are not enough to deal with the huge amount of data and its variations. Also,spammers are becoming smarter with technological advancement. Therefore, there is a need for hybrid techniques consisting of deep learning and conventional algorithms to cope with these problems. We have proposed a novel scheme in this paper for email spam detection, which will result in an improved feature selection approach from the original dataset and increase the accuracy of the classifier as well. The literature has been studied to explore the efficient machine learning models that have been applied by different researchers for email spam detection and feature selection to acquire the best results. Our method, GWO-BERT, has given remarkable results with deep learning techniques such as CNN, biLSTM and LSTM. We have compared our models with RF and LSTM and used dataset: “Lingspam,” which is a publicly available dataset. With different experiments, our technique, GWO-BERT, obtained 99.14% accuracy, which is almost equal to 100 percent.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000367/pdfft?md5=6c116c46366f074ba8f51cb6e2b18e31&pid=1-s2.0-S1110866524000367-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000367","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the influx of advancements in technology and the increased simplicity of communication through emails, there has been a severe threat to the global economy and security due to upsurge in volume of unsolicited During the training of models, high-dimensional and redundant datasets may reduce the classification results of the model due to high memory costs and high computation. An important data processing technique is feature selection which helps in selecting relevant features and subsets of information from the dataset. Therefore, choosing efficient feature selection techniques is very important for the best performance of classification of a model. Moreover, most of the research has been performed using traditional machine learning techniques, which are not enough to deal with the huge amount of data and its variations. Also, spammers are becoming smarter with technological advancement. Therefore, there is a need for hybrid techniques consisting of deep learning and conventional algorithms to cope with these problems. We have proposed a novel scheme in this paper for email spam detection, which will result in an improved feature selection approach from the original dataset and increase the accuracy of the classifier as well. The literature has been studied to explore the efficient machine learning models that have been applied by different researchers for email spam detection and feature selection to acquire the best results. Our method, GWO-BERT, has given remarkable results with deep learning techniques such as CNN, biLSTM and LSTM. We have compared our models with RF and LSTM and used dataset: “Lingspam,” which is a publicly available dataset. With different experiments, our technique, GWO-BERT, obtained 99.14% accuracy, which is almost equal to 100 percent.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用新型特征选择技术和 BERT 的深度学习模型检测垃圾邮件

由于技术的不断进步和电子邮件通信的日益简便，未经请求的邮件数量激增，对全球经济和安全构成了严重威胁。特征选择是一种重要的数据处理技术，有助于从数据集中选择相关特征和信息子集。因此，选择高效的特征选择技术对模型的最佳分类性能非常重要。此外，大多数研究都是使用传统的机器学习技术进行的，这些技术不足以处理海量数据及其变化。而且，随着技术的进步，垃圾邮件发送者也变得越来越聪明。因此，需要由深度学习和传统算法组成的混合技术来应对这些问题。我们在本文中提出了一种用于垃圾邮件检测的新方案，该方案将改进原始数据集的特征选择方法，并提高分类器的准确性。我们对文献进行了研究，以探索不同研究人员应用于垃圾邮件检测和特征选择的高效机器学习模型，从而获得最佳结果。我们的方法 GWO-BERT 在使用 CNN、biLSTM 和 LSTM 等深度学习技术后取得了显著效果。我们将我们的模型与 RF 和 LSTM 进行了比较，并使用了数据集："Lingspam"，这是一个公开可用的数据集。通过不同的实验，我们的 GWO-BERT 技术获得了 99.14% 的准确率，几乎等于 100%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research

CiteScore

11.10

自引率

1.90%

发文量

审稿时长

110 days

期刊介绍： The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.