An improved transformer‐based model for detecting phishing, spam and ham emails: A large language model approach

IF 1.5 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Security and Privacy Pub Date : 2024-04-24 DOI:10.1002/spy2.402

Suhaima Jamal, H. Wimmer, Iqbal H. Sarker

{"title":"An improved transformer‐based model for detecting phishing, spam and ham emails: A large language model approach","authors":"Suhaima Jamal, H. Wimmer, Iqbal H. Sarker","doi":"10.1002/spy2.402","DOIUrl":null,"url":null,"abstract":"Phishing and spam have been a cybersecurity threat with the majority of breaches resulting from these types of social engineering attacks. Therefore, detection has been a long‐standing challenge for both academic and industry researcher. New and innovative approaches are required to keep up with the growing sophistication of threat actors. One such illumination which has vast potential are large language models (LLM). LLM emerged and already demonstrated their potential to transform society and provide new and innovative approaches to solve well‐established challenges. Phishing and spam have caused financial hardships and lost time and resources to email users all over the world and frequently serve as an entry point for ransomware threat actors. While detection approaches exist, especially heuristic‐based approaches, LLMs offer the potential to venture into a new unexplored area for understanding and solving this challenge. LLMs have rapidly altered the landscape from business, consumers, and throughout academia and demonstrate transformational potential to profoundly impact the society. Based on this, applying these new and innovative approaches to email detection is a rational next step in academic research. In this work, we present IPSDM, an improved phishing spam detection model based on fine‐tuning the BERT family of models to specifically detect phishing and spam emails. We demonstrate our fine‐tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets. Moreover, IPSDM consistently outperforms the baseline models in terms of classification accuracy, precision, recall, and F1‐score, while concurrently mitigating overfitting concerns.","PeriodicalId":29939,"journal":{"name":"Security and Privacy","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spy2.402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Phishing and spam have been a cybersecurity threat with the majority of breaches resulting from these types of social engineering attacks. Therefore, detection has been a long‐standing challenge for both academic and industry researcher. New and innovative approaches are required to keep up with the growing sophistication of threat actors. One such illumination which has vast potential are large language models (LLM). LLM emerged and already demonstrated their potential to transform society and provide new and innovative approaches to solve well‐established challenges. Phishing and spam have caused financial hardships and lost time and resources to email users all over the world and frequently serve as an entry point for ransomware threat actors. While detection approaches exist, especially heuristic‐based approaches, LLMs offer the potential to venture into a new unexplored area for understanding and solving this challenge. LLMs have rapidly altered the landscape from business, consumers, and throughout academia and demonstrate transformational potential to profoundly impact the society. Based on this, applying these new and innovative approaches to email detection is a rational next step in academic research. In this work, we present IPSDM, an improved phishing spam detection model based on fine‐tuning the BERT family of models to specifically detect phishing and spam emails. We demonstrate our fine‐tuned version, IPSDM, is able to better classify emails in both unbalanced and balanced datasets. Moreover, IPSDM consistently outperforms the baseline models in terms of classification accuracy, precision, recall, and F1‐score, while concurrently mitigating overfitting concerns.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于转换器的改进模型，用于检测网络钓鱼、垃圾邮件和火腿肠邮件：大语言模型方法

网络钓鱼和垃圾邮件一直是网络安全的威胁，大多数漏洞都是由这类社会工程学攻击造成的。因此，检测一直是学术界和行业研究人员面临的长期挑战。我们需要新的创新方法来应对日益复杂的威胁行为。大型语言模型（LLM）就是这样一种具有巨大潜力的照明设备。LLM 的出现已经证明了其改变社会的潜力，并为解决既定挑战提供了新的创新方法。网络钓鱼和垃圾邮件给世界各地的电子邮件用户造成了经济困难、时间和资源损失，并经常成为勒索软件威胁行为者的切入点。虽然存在一些检测方法，特别是基于启发式的方法，但 LLM 为了解和解决这一挑战提供了进入一个新的未开发领域的可能性。LLM 已迅速改变了企业、消费者和整个学术界的格局，并展现出深刻影响社会的变革潜力。在此基础上，将这些新的创新方法应用于电子邮件检测是学术研究的下一个合理步骤。在这项工作中，我们介绍了 IPSDM，这是一种改进的网络钓鱼垃圾邮件检测模型，它基于对 BERT 系列模型的微调，能够专门检测网络钓鱼和垃圾邮件。我们证明，我们的微调版本 IPSDM 能够在不平衡和平衡数据集中更好地对电子邮件进行分类。此外，IPSDM 在分类准确度、精确度、召回率和 F1 分数方面始终优于基线模型，同时还减轻了过拟合问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Security and Privacy Multiple-

自引率

5.30%

发文量