基于url的移动设备网络钓鱼检测NLP变压器性能研究

H. Shirazi, K. Haynes, I. Ray
{"title":"基于url的移动设备网络钓鱼检测NLP变压器性能研究","authors":"H. Shirazi, K. Haynes, I. Ray","doi":"10.5383/juspn.17.01.005","DOIUrl":null,"url":null,"abstract":"Hackers are increasingly launching phishing attacks via SMS and social media. Games and dating apps introduce yet another attack vector. However, current deep learning-based phishing detection applications do not apply to mobile devices due to the computational burden. We propose a lightweight phishing detection algorithm that distinguishes phishing from legitimate websites solely from URLs to be used in mobile devices. As a baseline performance, we apply Artificial Neural Networks (ANNs) to URL-based and HTML-based website features. A model search results in 15 ANN models with accuracies >96%, comparable to state-of-the-art approaches. Next, we test the performance of deep ANNs on URLbased features only; however, all models perform poorly with the highest accuracy of 86.2%, indicating that URL-based features alone are not adequate to detect phishing websites even with deep ANNs. Since language transformers learn to represent context-dependent text sequences, we hypothesize that they will be able to learn directly from the text in URLs to distinguish between legitimate and malicious websites. We apply three state-of-the-art deep transformers (BERT, ELECTRA, and RoBERTa) for phishing detection. Testing custom and standard vocabularies, we find that pre-trained transformers available for immediate use (with fine-tuning) outperform the model trained with the custom URL-based vocabulary. In addition, we test a thinner BERT transformer which is suitable for lightweight devices like mobiles, called MobileBERT. Our results emphasize that evaluation metrics of this model are competitive to other models in this study, yet the testing time is significantly less, making this model a choice for embedding phishing detection algorithms in mobile phones. Using pre-trained transformers to predict phishing websites from only URLs has five advantages: 1) requires little training time (230 to 320 s), 2) is more easily updatable than feature-based approaches because no pre-processing of URLs is required, 3) is safer to use because phishing websites can be predicted without physically visiting the malicious sites, 4) is easily deployable for real-time detection and is applicable to run on mobile devices, and 5) using a mobile specific transformer yields comparable performance and predicts 3 times faster than the other transformer models in this study.","PeriodicalId":376249,"journal":{"name":"J. Ubiquitous Syst. Pervasive Networks","volume":"95 14","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Towards Performance of NLP Transformers on URL-Based Phishing Detection for Mobile Devices\",\"authors\":\"H. Shirazi, K. Haynes, I. Ray\",\"doi\":\"10.5383/juspn.17.01.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hackers are increasingly launching phishing attacks via SMS and social media. Games and dating apps introduce yet another attack vector. However, current deep learning-based phishing detection applications do not apply to mobile devices due to the computational burden. We propose a lightweight phishing detection algorithm that distinguishes phishing from legitimate websites solely from URLs to be used in mobile devices. As a baseline performance, we apply Artificial Neural Networks (ANNs) to URL-based and HTML-based website features. A model search results in 15 ANN models with accuracies >96%, comparable to state-of-the-art approaches. Next, we test the performance of deep ANNs on URLbased features only; however, all models perform poorly with the highest accuracy of 86.2%, indicating that URL-based features alone are not adequate to detect phishing websites even with deep ANNs. Since language transformers learn to represent context-dependent text sequences, we hypothesize that they will be able to learn directly from the text in URLs to distinguish between legitimate and malicious websites. We apply three state-of-the-art deep transformers (BERT, ELECTRA, and RoBERTa) for phishing detection. Testing custom and standard vocabularies, we find that pre-trained transformers available for immediate use (with fine-tuning) outperform the model trained with the custom URL-based vocabulary. In addition, we test a thinner BERT transformer which is suitable for lightweight devices like mobiles, called MobileBERT. Our results emphasize that evaluation metrics of this model are competitive to other models in this study, yet the testing time is significantly less, making this model a choice for embedding phishing detection algorithms in mobile phones. Using pre-trained transformers to predict phishing websites from only URLs has five advantages: 1) requires little training time (230 to 320 s), 2) is more easily updatable than feature-based approaches because no pre-processing of URLs is required, 3) is safer to use because phishing websites can be predicted without physically visiting the malicious sites, 4) is easily deployable for real-time detection and is applicable to run on mobile devices, and 5) using a mobile specific transformer yields comparable performance and predicts 3 times faster than the other transformer models in this study.\",\"PeriodicalId\":376249,\"journal\":{\"name\":\"J. Ubiquitous Syst. Pervasive Networks\",\"volume\":\"95 14\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Ubiquitous Syst. Pervasive Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5383/juspn.17.01.005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Ubiquitous Syst. Pervasive Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5383/juspn.17.01.005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

黑客越来越多地通过短信和社交媒体发起网络钓鱼攻击。游戏和约会应用程序引入了另一种攻击媒介。然而,目前基于深度学习的网络钓鱼检测应用程序由于计算负担而不适用于移动设备。我们提出了一种轻量级的网络钓鱼检测算法,该算法仅从移动设备中使用的url中区分网络钓鱼与合法网站。作为基准性能,我们将人工神经网络(ann)应用于基于url和基于html的网站特征。模型搜索结果为15个ANN模型,准确率>96%,与最先进的方法相当。接下来,我们仅在基于url的特征上测试深度人工神经网络的性能;然而,所有模型都表现不佳,最高准确率为86.2%,这表明即使使用深度人工神经网络,仅基于url的特征也不足以检测网络钓鱼网站。由于语言转换器学习表示与上下文相关的文本序列,我们假设它们将能够直接从url中的文本中学习,以区分合法和恶意网站。我们采用三个最先进的深层变压器(BERT, ELECTRA和RoBERTa)进行网络钓鱼检测。在测试自定义词汇表和标准词汇表时,我们发现可立即使用的预训练的转换器(通过微调)优于使用基于url的自定义词汇表训练的模型。此外,我们还测试了一种更薄的BERT变压器,它适用于手机等轻型设备,称为MobileBERT。我们的研究结果强调,该模型的评估指标与本研究中其他模型相比具有竞争力,但测试时间显著减少,使该模型成为在手机上嵌入网络钓鱼检测算法的选择。使用预训练的变形器仅从url预测网络钓鱼网站有五个优点:1)需要很少的训练时间(230到320秒),2)比基于特征的方法更容易更新,因为不需要对url进行预处理,3)使用更安全,因为网络钓鱼网站可以在不实际访问恶意网站的情况下预测,4)易于部署用于实时检测,适用于移动设备上运行。5)使用移动专用变压器产生可比的性能,并且预测速度比本研究中其他变压器模型快3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Performance of NLP Transformers on URL-Based Phishing Detection for Mobile Devices
Hackers are increasingly launching phishing attacks via SMS and social media. Games and dating apps introduce yet another attack vector. However, current deep learning-based phishing detection applications do not apply to mobile devices due to the computational burden. We propose a lightweight phishing detection algorithm that distinguishes phishing from legitimate websites solely from URLs to be used in mobile devices. As a baseline performance, we apply Artificial Neural Networks (ANNs) to URL-based and HTML-based website features. A model search results in 15 ANN models with accuracies >96%, comparable to state-of-the-art approaches. Next, we test the performance of deep ANNs on URLbased features only; however, all models perform poorly with the highest accuracy of 86.2%, indicating that URL-based features alone are not adequate to detect phishing websites even with deep ANNs. Since language transformers learn to represent context-dependent text sequences, we hypothesize that they will be able to learn directly from the text in URLs to distinguish between legitimate and malicious websites. We apply three state-of-the-art deep transformers (BERT, ELECTRA, and RoBERTa) for phishing detection. Testing custom and standard vocabularies, we find that pre-trained transformers available for immediate use (with fine-tuning) outperform the model trained with the custom URL-based vocabulary. In addition, we test a thinner BERT transformer which is suitable for lightweight devices like mobiles, called MobileBERT. Our results emphasize that evaluation metrics of this model are competitive to other models in this study, yet the testing time is significantly less, making this model a choice for embedding phishing detection algorithms in mobile phones. Using pre-trained transformers to predict phishing websites from only URLs has five advantages: 1) requires little training time (230 to 320 s), 2) is more easily updatable than feature-based approaches because no pre-processing of URLs is required, 3) is safer to use because phishing websites can be predicted without physically visiting the malicious sites, 4) is easily deployable for real-time detection and is applicable to run on mobile devices, and 5) using a mobile specific transformer yields comparable performance and predicts 3 times faster than the other transformer models in this study.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Optimized Kappa Architecture for IoT Data Management in Smart Farming Towards Low-Cost IoT and LPWAN-Based Flood Forecast and Monitoring System Towards Performance of NLP Transformers on URL-Based Phishing Detection for Mobile Devices The way it made me feel - Creating and evaluating an in-app feedback tool for mobile apps Fire Risk Prediction Using Cloud-based Weather Data Services
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1