Detecting Malicious URLs: A Semi-Supervised Machine Learning System Approach

A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu
{"title":"Detecting Malicious URLs: A Semi-Supervised Machine Learning System Approach","authors":"A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu","doi":"10.1109/SYNASC.2016.045","DOIUrl":null,"url":null,"abstract":"As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.","PeriodicalId":268635,"journal":{"name":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2016.045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
检测恶意url:一种半监督机器学习系统方法
随着恶意软件行业的发展,感染计算机或设备的手段也在不断发展。最常见的感染媒介之一是利用互联网作为切入点。这种方法不仅易于使用,而且由于URL的形式和形状不同,因此很难区分恶意URL和良性URL。此外,每个试图分类或检测URL的系统都必须在实时流上工作,并且需要为每个提交用于分析的URL提供快速响应(在我们的上下文中,快速响应意味着少于300-400毫秒/URL)。从恶意软件创建者的角度来看,在一天内多次更改这样的url确实很容易。根据一般观察,恶意url的寿命往往很短(它们出现,提供恶意内容几个小时,然后通常被它们所在的ISP关闭)。本文旨在提出一种分析网络流量中的url的系统,该系统还能够调整其检测模型以适应新的恶意内容。每个正确分类的URL都被重用为新数据集的一部分,作为新检测模型的主干。该系统还使用不同的聚类技术来识别恶意url上缺乏的特征,从而创建一种方法来改进对这类威胁的检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Hybrid CPU/GPU Approach for the Parallel Algebraic Recursive Multilevel Solver pARMS Continuation Semantics of a Language Inspired by Membrane Computing with Symport/Antiport Interactions Parallel Integer Polynomial Multiplication A Numerical Method for Analyzing the Stability of Bi-Parametric Biological Systems Comparing Different Term Weighting Schemas for Topic Modeling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1