Improving Phishing Website Detection using a Hybrid Two-level Framework for Feature Selection and XGBoost Tuning

IF 1 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Journal of Web Engineering Pub Date : 2023-03-01 DOI:10.13052/jwe1540-9589.2237
Luka Jovanovic;Dijana Jovanovic;Milos Antonijevic;Bosko Nikolic;Nebojsa Bacanin;Miodrag Zivkovic;Ivana Strumberger
{"title":"Improving Phishing Website Detection using a Hybrid Two-level Framework for Feature Selection and XGBoost Tuning","authors":"Luka Jovanovic;Dijana Jovanovic;Milos Antonijevic;Bosko Nikolic;Nebojsa Bacanin;Miodrag Zivkovic;Ivana Strumberger","doi":"10.13052/jwe1540-9589.2237","DOIUrl":null,"url":null,"abstract":"In the last few decades, the World Wide Web has become a necessity that offers numerous services to end users. The number of online transactions increases daily, as well as that of malicious actors. Machine learning plays a vital role in the majority of modern solutions. To further improve Web security, this paper proposes a hybrid approach based on the eXtreme Gradient Boosting (XGBoost) machine learning model optimized by an improved version of the well-known metaheuristics algorithm. In this research, the improved firefly algorithm is employed in the two-tier framework, which was also developed as part of the research, to perform both the feature selection and adjustment of the XGBoost hyper-parameters. The performance of the introduced hybrid model is evaluated against three instances of well-known publicly available phishing website datasets. The performance of novel introduced algorithms is additionally compared against cutting-edge metaheuristics that are utilized in the same framework. The first two datasets were provided by Mendeley Data, while the third was acquired from the University of California, Irvine machine learning repository. Additionally, the best performing models have been subjected to SHapley Additive exPlanations (SHAP) analysis to determine the impact of each feature on model decisions. The obtained results suggest that the proposed hybrid solution achieves a superior performance level in comparison to other approaches, and that it represents a perspective solution in the domain of web security.","PeriodicalId":49952,"journal":{"name":"Journal of Web Engineering","volume":"22 3","pages":"543-574"},"PeriodicalIF":1.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/10243554/10243555/10247501.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Web Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10247501/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 1

Abstract

In the last few decades, the World Wide Web has become a necessity that offers numerous services to end users. The number of online transactions increases daily, as well as that of malicious actors. Machine learning plays a vital role in the majority of modern solutions. To further improve Web security, this paper proposes a hybrid approach based on the eXtreme Gradient Boosting (XGBoost) machine learning model optimized by an improved version of the well-known metaheuristics algorithm. In this research, the improved firefly algorithm is employed in the two-tier framework, which was also developed as part of the research, to perform both the feature selection and adjustment of the XGBoost hyper-parameters. The performance of the introduced hybrid model is evaluated against three instances of well-known publicly available phishing website datasets. The performance of novel introduced algorithms is additionally compared against cutting-edge metaheuristics that are utilized in the same framework. The first two datasets were provided by Mendeley Data, while the third was acquired from the University of California, Irvine machine learning repository. Additionally, the best performing models have been subjected to SHapley Additive exPlanations (SHAP) analysis to determine the impact of each feature on model decisions. The obtained results suggest that the proposed hybrid solution achieves a superior performance level in comparison to other approaches, and that it represents a perspective solution in the domain of web security.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用用于特征选择和XGBoost调整的混合两级框架改进钓鱼网站检测
在过去的几十年里,万维网已经成为向最终用户提供大量服务的必需品。在线交易的数量每天都在增加,恶意行为者的数量也在增加。机器学习在大多数现代解决方案中发挥着至关重要的作用。为了进一步提高Web安全性,本文提出了一种基于极限梯度提升(XGBoost)机器学习模型的混合方法,该模型通过著名元启发式算法的改进版本进行了优化。在本研究中,改进的萤火虫算法被用于双层框架中,该框架也是作为研究的一部分开发的,用于执行XGBoost超参数的特征选择和调整。针对三个已知的公开可用的钓鱼网站数据集实例,对引入的混合模型的性能进行了评估。此外,还将新引入的算法的性能与在同一框架中使用的尖端元启发式算法进行了比较。前两个数据集由Mendeley Data提供,第三个数据集来自加州大学欧文分校的机器学习库。此外,对性能最好的模型进行了SHapley加性预测(SHAP)分析,以确定每个特征对模型决策的影响。所获得的结果表明,与其他方法相比,所提出的混合解决方案实现了更高的性能水平,并且它代表了网络安全领域的一个前瞻性解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Web Engineering
Journal of Web Engineering 工程技术-计算机:理论方法
CiteScore
1.80
自引率
12.50%
发文量
62
审稿时长
9 months
期刊介绍: The World Wide Web and its associated technologies have become a major implementation and delivery platform for a large variety of applications, ranging from simple institutional information Web sites to sophisticated supply-chain management systems, financial applications, e-government, distance learning, and entertainment, among others. Such applications, in addition to their intrinsic functionality, also exhibit the more complex behavior of distributed applications.
期刊最新文献
Design and Optimization of Hybrid End-to-End Encryption Architecture for a Secure Web Application System Web-Engineered ECC-Based Group Key Protocol for Secure and Scalable Metering Communication A Study on the Comparative Analysis of Embedded and Zero Watermarking for Unstructured Image Protection Knowledge Interaction and Diffusion Augmentation for Knowledge Graph Recommendation Reinforcement Learning-Driven Intelligent Monitoring for Data Integrity in Smart Electricity Fee Channels
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1