A Fault Tolerant Approach for Malicious URL Filtering

Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris
{"title":"A Fault Tolerant Approach for Malicious URL Filtering","authors":"Mansoor Ahmed, Abid Khan, Osama Saleem, Muhammad Haris","doi":"10.1109/ISNCC.2018.8530984","DOIUrl":null,"url":null,"abstract":"Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.","PeriodicalId":313846,"journal":{"name":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Symposium on Networks, Computers and Communications (ISNCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNCC.2018.8530984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Existing URL filtering mechanisms lacks support for real-time fault tolerance and scalability. In this paper these issues are addressed by developing a scalable model which is real time and fault tolerant to classify streams of URL traffic. The key feature of our model is that it saves computation time, resources usage and bandwidth. This model is implemented in Apache Spark which runs APIs for machine learning and streaming. The dataset consists of 2.4 million URLs which were taken from both clean and malicious classes. In training set, clean URLs are labeled as 1 and malicious are labeled as 0. For this proposed model, distributed in-memory computation is provided by Apache Spark's resilient distributed datasets (RDD) in fault tolerant manner. By increasing number of nodes in the cluster we achieved linear scalability. Our model attained an accuracy of 96% on logistic regression classifier and scaled up well with the Apache Spark's cluster. In 55 second using logistic regression classifier from Spark ML1ib, 2 million URLs can be filtered. The model achieved fl-score values of 0.92, 0.95 and 0.93 along with precision and the results are evaluated using cross-validation schemes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种恶意URL过滤的容错方法
现有的URL过滤机制缺乏对实时容错和可伸缩性的支持。本文通过开发一种实时、容错的可扩展URL流分类模型来解决这些问题。该模型的主要特点是节省了计算时间、资源使用和带宽。该模型是在Apache Spark中实现的,它运行用于机器学习和流媒体的api。该数据集由240万个url组成,这些url来自干净类和恶意类。在训练集中,干净url被标记为1,恶意url被标记为0。在这个模型中,分布式内存计算由Apache Spark的弹性分布式数据集(RDD)以容错的方式提供。通过增加集群中的节点数量,我们实现了线性可扩展性。我们的模型在逻辑回归分类器上达到了96%的准确率,并且在Apache Spark的集群上进行了很好的扩展。使用Spark ML1ib的逻辑回归分类器,在55秒内可以过滤200万个url。模型的f -score值分别为0.92、0.95和0.93,精度较高,并采用交叉验证方案对结果进行评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
TCP performance for Satellite M2M applications over Random Access links TCP Wave estimation of the optimal operating point using ACK trains Practical Approach of Fast-Data Architecture Applied to Alert Generation in Emergency Evacuation Systems Interference and Link Budget Analysis in Integrated Satellite and Terrestrial Mobile System Underdetermined Blind Separation Via Rough Equivalence Clustering for Satellite Communications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1