An improved transformer‐based model for detecting phishing, spam and ham emails: A large language model approach

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Security and Privacy Pub Date : 2024-04-24 DOI:10.1002/spy2.402
Suhaima Jamal, H. Wimmer, Iqbal H. Sarker
{"title":"An improved transformer‐based model for detecting phishing, spam and ham emails: A large language model approach","authors":"Suhaima Jamal, H. Wimmer, Iqbal H. Sarker","doi":"10.1002/spy2.402","DOIUrl":null,"url":null,"abstract":"Phishing and spam have been a cybersecurity threat with the majority of breaches resulting from these types of social engineering attacks. Therefore, detection has been a long‐standing challenge for both academic and industry researcher. New and innovative approaches are required to keep up with the growing sophistication of threat actors. One such illumination which has vast potential are large language models (LLM). LLM emerged and already demonstrated their potential to transform society and provide new and innovative approaches to solve well‐established challenges. Phishing and spam have caused financial hardships and lost time and resources to email users all over the world and frequently serve as an entry point for ransomware threat actors. While detection approaches exist, especially heuristic‐based approaches, LLMs offer the potential to venture into a new unexplored area for understanding and solving this challenge. LLMs have rapidly altered the landscape from business, consumers, and throughout academia and demonstrate transformational potential to profoundly impact the society. Based on this, applying these new and innovative approaches to email detection is a rational next step in academic research. In this work, we present IPSDM, an improved phishing spam detection model based on fine‐tuning the BERT family of models to specifically detect phishing and spam emails. We demonstrate our fine‐tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets. Moreover, IPSDM consistently outperforms the baseline models in terms of classification accuracy, precision, recall, and F1‐score, while concurrently mitigating overfitting concerns.","PeriodicalId":29939,"journal":{"name":"Security and Privacy","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spy2.402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Phishing and spam have been a cybersecurity threat with the majority of breaches resulting from these types of social engineering attacks. Therefore, detection has been a long‐standing challenge for both academic and industry researcher. New and innovative approaches are required to keep up with the growing sophistication of threat actors. One such illumination which has vast potential are large language models (LLM). LLM emerged and already demonstrated their potential to transform society and provide new and innovative approaches to solve well‐established challenges. Phishing and spam have caused financial hardships and lost time and resources to email users all over the world and frequently serve as an entry point for ransomware threat actors. While detection approaches exist, especially heuristic‐based approaches, LLMs offer the potential to venture into a new unexplored area for understanding and solving this challenge. LLMs have rapidly altered the landscape from business, consumers, and throughout academia and demonstrate transformational potential to profoundly impact the society. Based on this, applying these new and innovative approaches to email detection is a rational next step in academic research. In this work, we present IPSDM, an improved phishing spam detection model based on fine‐tuning the BERT family of models to specifically detect phishing and spam emails. We demonstrate our fine‐tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets. Moreover, IPSDM consistently outperforms the baseline models in terms of classification accuracy, precision, recall, and F1‐score, while concurrently mitigating overfitting concerns.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于转换器的改进模型,用于检测网络钓鱼、垃圾邮件和火腿肠邮件:大语言模型方法
网络钓鱼和垃圾邮件一直是网络安全的威胁,大多数漏洞都是由这类社会工程学攻击造成的。因此,检测一直是学术界和行业研究人员面临的长期挑战。我们需要新的创新方法来应对日益复杂的威胁行为。大型语言模型(LLM)就是这样一种具有巨大潜力的照明设备。LLM 的出现已经证明了其改变社会的潜力,并为解决既定挑战提供了新的创新方法。网络钓鱼和垃圾邮件给世界各地的电子邮件用户造成了经济困难、时间和资源损失,并经常成为勒索软件威胁行为者的切入点。虽然存在一些检测方法,特别是基于启发式的方法,但 LLM 为了解和解决这一挑战提供了进入一个新的未开发领域的可能性。LLM 已迅速改变了企业、消费者和整个学术界的格局,并展现出深刻影响社会的变革潜力。在此基础上,将这些新的创新方法应用于电子邮件检测是学术研究的下一个合理步骤。在这项工作中,我们介绍了 IPSDM,这是一种改进的网络钓鱼垃圾邮件检测模型,它基于对 BERT 系列模型的微调,能够专门检测网络钓鱼和垃圾邮件。我们证明,我们的微调版本 IPSDM 能够在不平衡和平衡数据集中更好地对电子邮件进行分类。此外,IPSDM 在分类准确度、精确度、召回率和 F1 分数方面始终优于基线模型,同时还减轻了过拟合问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
5.30%
发文量
80
期刊最新文献
IoT malware detection using static and dynamic analysis techniques: A systematic literature review An approach for mitigating cognitive load in password management by integrating QR codes and steganography Cryptographic methods for secured communication in SDN‐based VANETs: A performance analysis Exploring security and privacy enhancement technologies in the Internet of Things: A comprehensive review Research on privacy leakage of celebrity's ID card number based on real‐name authentication
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1