TransURL: Improving malicious URL detection with multi-layer Transformer encoding and multi-scale pyramid features

IF 4.4 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Computer Networks Pub Date : 2024-08-13 DOI:10.1016/j.comnet.2024.110707
{"title":"TransURL: Improving malicious URL detection with multi-layer Transformer encoding and multi-scale pyramid features","authors":"","doi":"10.1016/j.comnet.2024.110707","DOIUrl":null,"url":null,"abstract":"<div><p>While machine learning progress is advancing the detection of malicious URLs, advanced Transformers applied to URLs face difficulties in extracting local information, character-level information, and structural relationships. To address these challenges, we propose a novel approach for malicious URL detection, named TransURL, that is implemented by co-training the character-aware Transformer with three feature modules—Multi-Layer Encoding, Multi-Scale Feature Learning, and Spatial Pyramid Attention. This special Transformer allows TransURL to extract embeddings that contain character-level information from URL token sequences, with three feature modules contributing to the fusion of multi-layer Transformer encodings and the capture of multi-scale local details and structural relationships. The proposed method is evaluated across several challenging scenarios, including class imbalance learning, multi-classification, cross-dataset testing, and adversarial sample attacks. The experimental results demonstrate a significant improvement compared to the best previous methods. For instance, it achieved a peak F1-score improvement of 40% in class-imbalanced scenarios, and exceeded the best baseline result by 14.13% in accuracy in adversarial attack scenarios. Additionally, we conduct a case study where our method accurately identifies all 30 active malicious web pages, whereas two pior SOTA methods miss 4 and 7 malicious web pages respectively. The codes and data are available at: <span><span>https://github.com/Vul-det/TransURL/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":null,"pages":null},"PeriodicalIF":4.4000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128624005395","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

While machine learning progress is advancing the detection of malicious URLs, advanced Transformers applied to URLs face difficulties in extracting local information, character-level information, and structural relationships. To address these challenges, we propose a novel approach for malicious URL detection, named TransURL, that is implemented by co-training the character-aware Transformer with three feature modules—Multi-Layer Encoding, Multi-Scale Feature Learning, and Spatial Pyramid Attention. This special Transformer allows TransURL to extract embeddings that contain character-level information from URL token sequences, with three feature modules contributing to the fusion of multi-layer Transformer encodings and the capture of multi-scale local details and structural relationships. The proposed method is evaluated across several challenging scenarios, including class imbalance learning, multi-classification, cross-dataset testing, and adversarial sample attacks. The experimental results demonstrate a significant improvement compared to the best previous methods. For instance, it achieved a peak F1-score improvement of 40% in class-imbalanced scenarios, and exceeded the best baseline result by 14.13% in accuracy in adversarial attack scenarios. Additionally, we conduct a case study where our method accurately identifies all 30 active malicious web pages, whereas two pior SOTA methods miss 4 and 7 malicious web pages respectively. The codes and data are available at: https://github.com/Vul-det/TransURL/.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TransURL:利用多层变换器编码和多尺度金字塔特征改进恶意 URL 检测
虽然机器学习的进步推动了恶意 URL 的检测,但应用于 URL 的高级变换器在提取本地信息、字符级信息和结构关系方面面临困难。为了应对这些挑战,我们提出了一种新颖的恶意 URL 检测方法,命名为 TransURL,它是通过将字符感知变换器与三个特征模块(多层编码、多尺度特征学习和空间金字塔注意)共同训练而实现的。这种特殊的变换器允许 TransURL 从 URL 标记序列中提取包含字符级信息的嵌入,三个特征模块有助于融合多层变换器编码和捕捉多尺度局部细节和结构关系。我们在多个具有挑战性的场景中对所提出的方法进行了评估,包括类不平衡学习、多分类、跨数据集测试和对抗性样本攻击。实验结果表明,与之前的最佳方法相比,该方法有了显著改进。例如,在类不平衡场景中,它的 F1 分数峰值提高了 40%,在对抗性攻击场景中,它的准确率比最佳基线结果高出 14.13%。此外,我们还进行了一项案例研究,在这项研究中,我们的方法准确识别了所有 30 个活跃的恶意网页,而之前的两种 SOTA 方法分别漏掉了 4 个和 7 个恶意网页。代码和数据可在以下网址获取:https://github.com/Vul-det/TransURL/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Networks
Computer Networks 工程技术-电信学
CiteScore
10.80
自引率
3.60%
发文量
434
审稿时长
8.6 months
期刊介绍: Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.
期刊最新文献
SD-MDN-TM: A traceback and mitigation integrated mechanism against DDoS attacks with IP spoofing On the aggregation of FIBs at ICN routers using routing strategy Protecting unauthenticated messages in LTE/5G mobile networks: A two-level Hierarchical Identity-Based Signature (HIBS) solution A two-step linear programming approach for repeater placement in large-scale quantum networks Network traffic prediction based on PSO-LightGBM-TM
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1