检测恶意url:一种半监督机器学习系统方法

2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) Pub Date : 2016-09-01 DOI:10.1109/SYNASC.2016.045

A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu

{"title":"检测恶意url:一种半监督机器学习系统方法","authors":"A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu","doi":"10.1109/SYNASC.2016.045","DOIUrl":null,"url":null,"abstract":"As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.","PeriodicalId":268635,"journal":{"name":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Detecting Malicious URLs: A Semi-Supervised Machine Learning System Approach\",\"authors\":\"A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu\",\"doi\":\"10.1109/SYNASC.2016.045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.\",\"PeriodicalId\":268635,\"journal\":{\"name\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2016.045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2016.045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

随着恶意软件行业的发展，感染计算机或设备的手段也在不断发展。最常见的感染媒介之一是利用互联网作为切入点。这种方法不仅易于使用，而且由于URL的形式和形状不同，因此很难区分恶意URL和良性URL。此外，每个试图分类或检测URL的系统都必须在实时流上工作，并且需要为每个提交用于分析的URL提供快速响应(在我们的上下文中，快速响应意味着少于300-400毫秒/URL)。从恶意软件创建者的角度来看，在一天内多次更改这样的url确实很容易。根据一般观察，恶意url的寿命往往很短(它们出现，提供恶意内容几个小时，然后通常被它们所在的ISP关闭)。本文旨在提出一种分析网络流量中的url的系统，该系统还能够调整其检测模型以适应新的恶意内容。每个正确分类的URL都被重用为新数据集的一部分，作为新检测模型的主干。该系统还使用不同的聚类技术来识别恶意url上缺乏的特征，从而创建一种方法来改进对这类威胁的检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Detecting Malicious URLs: A Semi-Supervised Machine Learning System Approach

As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

自引率

0.00%

发文量