{"title":"Development of BiLSTM deep learning model to detect URL-based phishing attacks","authors":"Öznur Şifa Akçam , Adem Tekerek , Mehmet Tekerek","doi":"10.1016/j.compeleceng.2025.110212","DOIUrl":null,"url":null,"abstract":"<div><div>Phishing attacks steal critical information by exploiting security vulnerabilities in information systems. This study aims to detect URL-based phishing attacks. In this study, a deep learning model based on character and word-based feature extraction is developed. With the developed model, URLs are classified as legitimate or phishing. Bidirectional Long Short-Term Memory (BiLSTM) algorithm and GramBeddings, Malicious and Benign URLs, and Ebbu2017 Phishing datasets were used to develop the model. Also, Mendeley Data Web Page Phishing Detection datasets were used to test the developed model. The developed model achieved test results of 98.24% accuracy and 0.9977 area under curve (AUC) for the GramBeddings dataset, 99.32% accuracy and 0.9986 AUC for the Malicious and Benign URLs dataset, 98.34% accuracy and 0.9981 AUC for the Ebbu2017 dataset, and 90.33% accuracy and 0.9694 AUC for the Mendeley Data Web Page Phishing Detection dataset. These results prove the effectiveness of the model in detecting phishing attacks. The model's uniqueness is that it analyses the structural patterns of URLs through character-based inference and evaluates the contextual meaning through word-based inference. This enables effective detection of phishing URLs at both character and word levels.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"123 ","pages":"Article 110212"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625001557","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/28 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Phishing attacks steal critical information by exploiting security vulnerabilities in information systems. This study aims to detect URL-based phishing attacks. In this study, a deep learning model based on character and word-based feature extraction is developed. With the developed model, URLs are classified as legitimate or phishing. Bidirectional Long Short-Term Memory (BiLSTM) algorithm and GramBeddings, Malicious and Benign URLs, and Ebbu2017 Phishing datasets were used to develop the model. Also, Mendeley Data Web Page Phishing Detection datasets were used to test the developed model. The developed model achieved test results of 98.24% accuracy and 0.9977 area under curve (AUC) for the GramBeddings dataset, 99.32% accuracy and 0.9986 AUC for the Malicious and Benign URLs dataset, 98.34% accuracy and 0.9981 AUC for the Ebbu2017 dataset, and 90.33% accuracy and 0.9694 AUC for the Mendeley Data Web Page Phishing Detection dataset. These results prove the effectiveness of the model in detecting phishing attacks. The model's uniqueness is that it analyses the structural patterns of URLs through character-based inference and evaluates the contextual meaning through word-based inference. This enables effective detection of phishing URLs at both character and word levels.
网络钓鱼攻击通过利用信息系统中的安全漏洞窃取关键信息。本研究旨在检测基于url的网络钓鱼攻击。本文提出了一种基于字符和词特征提取的深度学习模型。在开发的模型中,url被分类为合法的或网络钓鱼的。使用双向长短期记忆(BiLSTM)算法和GramBeddings、恶意和良性url以及Ebbu2017网络钓鱼数据集来开发模型。此外,还使用Mendeley Data Web Page Phishing Detection数据集对所开发的模型进行了测试。所开发的模型对GramBeddings数据集的准确率为98.24%,曲线下面积(AUC)为0.9977;对恶意和良性url数据集的准确率为99.32%,曲线下面积(AUC)为0.9986;对Ebbu2017数据集的准确率为98.34%,曲线下面积(AUC)为0.9981;这些结果证明了该模型在检测网络钓鱼攻击方面的有效性。该模型的独特之处在于,它通过基于字符的推理来分析url的结构模式,并通过基于单词的推理来评估上下文意义。这可以在字符和单词级别有效地检测网络钓鱼url。
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.