Email spam detection by deep learning models using novel feature selection technique and BERT

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Egyptian Informatics Journal Pub Date : 2024-04-30 DOI:10.1016/j.eij.2024.100473
Ghazala Nasreen, Muhammad Murad Khan, Muhammad Younus, Bushra Zafar, Muhammad Kashif Hanif
{"title":"Email spam detection by deep learning models using novel feature selection technique and BERT","authors":"Ghazala Nasreen,&nbsp;Muhammad Murad Khan,&nbsp;Muhammad Younus,&nbsp;Bushra Zafar,&nbsp;Muhammad Kashif Hanif","doi":"10.1016/j.eij.2024.100473","DOIUrl":null,"url":null,"abstract":"<div><p>Due to the influx of advancements in technology and the increased simplicity of communication through emails, there has been a severe threat to the global economy and security due to upsurge in volume of unsolicited During the training of models, high-dimensional and redundant datasets may reduce the classification results of the model due to high memory costs and high computation. An important data processing technique is feature selection which helps in selecting relevant features and subsets of information from the dataset. Therefore, choosing efficient feature selection techniques is very important for the best performance of classification of a model. Moreover, most of the research has been performed using traditional machine learning techniques, which are not enough to deal with the huge amount of data and its variations. Also,<!--> <!-->spammers are becoming smarter with technological advancement. Therefore, there is a need for hybrid techniques consisting of deep learning and conventional algorithms to cope with these problems. We have proposed a novel scheme in this paper for email spam detection, which will result in an improved feature selection approach from the original dataset and increase the accuracy of the classifier as well. The literature has been studied to explore the efficient machine learning models that have been applied by different researchers for email spam detection and feature selection to acquire the best results. Our method, GWO-BERT, has given remarkable results with deep learning techniques such as CNN, biLSTM and LSTM. We have compared our models with RF and LSTM and used dataset: “Lingspam,” which is a publicly available dataset. With different experiments, our technique, GWO-BERT, obtained 99.14% accuracy, which is almost equal to 100 percent.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000367/pdfft?md5=6c116c46366f074ba8f51cb6e2b18e31&pid=1-s2.0-S1110866524000367-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000367","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Due to the influx of advancements in technology and the increased simplicity of communication through emails, there has been a severe threat to the global economy and security due to upsurge in volume of unsolicited During the training of models, high-dimensional and redundant datasets may reduce the classification results of the model due to high memory costs and high computation. An important data processing technique is feature selection which helps in selecting relevant features and subsets of information from the dataset. Therefore, choosing efficient feature selection techniques is very important for the best performance of classification of a model. Moreover, most of the research has been performed using traditional machine learning techniques, which are not enough to deal with the huge amount of data and its variations. Also, spammers are becoming smarter with technological advancement. Therefore, there is a need for hybrid techniques consisting of deep learning and conventional algorithms to cope with these problems. We have proposed a novel scheme in this paper for email spam detection, which will result in an improved feature selection approach from the original dataset and increase the accuracy of the classifier as well. The literature has been studied to explore the efficient machine learning models that have been applied by different researchers for email spam detection and feature selection to acquire the best results. Our method, GWO-BERT, has given remarkable results with deep learning techniques such as CNN, biLSTM and LSTM. We have compared our models with RF and LSTM and used dataset: “Lingspam,” which is a publicly available dataset. With different experiments, our technique, GWO-BERT, obtained 99.14% accuracy, which is almost equal to 100 percent.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用新型特征选择技术和 BERT 的深度学习模型检测垃圾邮件
由于技术的不断进步和电子邮件通信的日益简便,未经请求的邮件数量激增,对全球经济和安全构成了严重威胁。特征选择是一种重要的数据处理技术,有助于从数据集中选择相关特征和信息子集。因此,选择高效的特征选择技术对模型的最佳分类性能非常重要。此外,大多数研究都是使用传统的机器学习技术进行的,这些技术不足以处理海量数据及其变化。而且,随着技术的进步,垃圾邮件发送者也变得越来越聪明。因此,需要由深度学习和传统算法组成的混合技术来应对这些问题。我们在本文中提出了一种用于垃圾邮件检测的新方案,该方案将改进原始数据集的特征选择方法,并提高分类器的准确性。我们对文献进行了研究,以探索不同研究人员应用于垃圾邮件检测和特征选择的高效机器学习模型,从而获得最佳结果。我们的方法 GWO-BERT 在使用 CNN、biLSTM 和 LSTM 等深度学习技术后取得了显著效果。我们将我们的模型与 RF 和 LSTM 进行了比较,并使用了数据集:"Lingspam",这是一个公开可用的数据集。通过不同的实验,我们的 GWO-BERT 技术获得了 99.14% 的准确率,几乎等于 100%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Egyptian Informatics Journal
Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research
CiteScore
11.10
自引率
1.90%
发文量
59
审稿时长
110 days
期刊介绍: The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.
期刊最新文献
HD-MVCNN: High-density ECG signal based diabetic prediction and classification using multi-view convolutional neural network A hybrid encryption algorithm based approach for secure privacy protection of big data in hospitals A new probabilistic linguistic decision-making process based on PL-BWM and improved three-way TODIM methods Interval valued inventory model with different payment strategies for green products under interval valued Grey Wolf optimizer Algorithm fitness function Intelligent SDN to enhance security in IoT networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1