A Hybrid Phishing Detection System Using Deep Learning-based URL and Content Analysis

IF 0.9 4区工程技术 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Elektronika Ir Elektrotechnika Pub Date : 2022-10-26 DOI:10.5755/j02.eie.31197

Mehmet Korkmaz, Emre Kocyigit, O. K. Sahingoz, B. Diri

{"title":"A Hybrid Phishing Detection System Using Deep Learning-based URL and Content Analysis","authors":"Mehmet Korkmaz, Emre Kocyigit, O. K. Sahingoz, B. Diri","doi":"10.5755/j02.eie.31197","DOIUrl":null,"url":null,"abstract":"Phishing attacks are one of the most preferred types of attacks for cybercriminals, who can easily contact a large number of victims through the use of social networks, particularly through email messages. To protect end users, most of the security mechanisms control Uniform Resource Locator (URL) addresses because of their simplicity of implementation and execution speed. However, due to sophisticated attackers, this mechanism can miss some phishing attacks and has a relatively high false positive rate. In this research, a hybrid technique is proposed that uses not only URL features, but also content-based features as the second level of detection mechanism, thus improving the accuracy of the detection system while also minimizing the number of false positives. Additionally, most phishing detection algorithms use datasets that contain easily differentiated data pieces, either phishing or legitimate. However, in order to implement a more secure protection mechanism, we aimed to collect a larger and high-risk dataset. The proposed approaches were tested on this High-Risk URL and Content-Based Phishing Detection Dataset that only contains suspicious websites from PhishTank. According to experimental studies, an accuracy rate of 98.37 percent was achieved on a more realistic dataset for phishing detection.","PeriodicalId":51031,"journal":{"name":"Elektronika Ir Elektrotechnika","volume":" ","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2022-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Elektronika Ir Elektrotechnika","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.5755/j02.eie.31197","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

Abstract

Phishing attacks are one of the most preferred types of attacks for cybercriminals, who can easily contact a large number of victims through the use of social networks, particularly through email messages. To protect end users, most of the security mechanisms control Uniform Resource Locator (URL) addresses because of their simplicity of implementation and execution speed. However, due to sophisticated attackers, this mechanism can miss some phishing attacks and has a relatively high false positive rate. In this research, a hybrid technique is proposed that uses not only URL features, but also content-based features as the second level of detection mechanism, thus improving the accuracy of the detection system while also minimizing the number of false positives. Additionally, most phishing detection algorithms use datasets that contain easily differentiated data pieces, either phishing or legitimate. However, in order to implement a more secure protection mechanism, we aimed to collect a larger and high-risk dataset. The proposed approaches were tested on this High-Risk URL and Content-Based Phishing Detection Dataset that only contains suspicious websites from PhishTank. According to experimental studies, an accuracy rate of 98.37 percent was achieved on a more realistic dataset for phishing detection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于深度学习的URL和内容分析的混合网络钓鱼检测系统

网络钓鱼攻击是网络犯罪分子最喜欢的攻击类型之一，他们可以通过使用社交网络，特别是通过电子邮件，轻松联系大量受害者。为了保护最终用户，大多数安全机制都控制统一资源定位器（URL）地址，因为它们的实现和执行速度很简单。然而，由于攻击者复杂，该机制可能会错过一些网络钓鱼攻击，并且误报率相对较高。在本研究中，提出了一种混合技术，该技术不仅使用URL特征，还使用基于内容的特征作为第二级检测机制，从而提高检测系统的准确性，同时最大限度地减少误报数量。此外，大多数网络钓鱼检测算法使用的数据集包含易于区分的数据片段，无论是网络钓鱼还是合法数据。然而，为了实现更安全的保护机制，我们的目标是收集更大且高风险的数据集。所提出的方法在这个高风险URL和基于内容的网络钓鱼检测数据集上进行了测试，该数据集仅包含来自PhishTank的可疑网站。根据实验研究，在更真实的网络钓鱼检测数据集上，准确率达到98.37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Elektronika Ir Elektrotechnika 工程技术-工程：电子与电气

CiteScore

2.40

自引率

7.70%

发文量

审稿时长

24 months

期刊介绍： The journal aims to attract original research papers on featuring practical developments in the field of electronics and electrical engineering. The journal seeks to publish research progress in the field of electronics and electrical engineering with an emphasis on the applied rather than the theoretical in as much detail as possible. The journal publishes regular papers dealing with the following areas, but not limited to: Electronics; Electronic Measurements; Signal Technology; Microelectronics; High Frequency Technology, Microwaves. Electrical Engineering; Renewable Energy; Automation, Robotics; Telecommunications Engineering.