A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu
{"title":"检测恶意url:一种半监督机器学习系统方法","authors":"A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu","doi":"10.1109/SYNASC.2016.045","DOIUrl":null,"url":null,"abstract":"As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.","PeriodicalId":268635,"journal":{"name":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Detecting Malicious URLs: A Semi-Supervised Machine Learning System Approach\",\"authors\":\"A. Gabriel, Dragos Gavrilut, Baetu Ioan Alexandru, Adrian-Stefan Popescu\",\"doi\":\"10.1109/SYNASC.2016.045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.\",\"PeriodicalId\":268635,\"journal\":{\"name\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNASC.2016.045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2016.045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Malicious URLs: A Semi-Supervised Machine Learning System Approach
As malware industry grows, so does the means of infecting a computer or device evolve. One of the most common infection vector is to use the Internet as an entry point. Not only that this method is easy to use, but due to the fact that URLs come in different forms and shapes, it is really difficult to distinguish a malicious URL from a benign one. Furthermore, every system that tries to classify or detect URLs must work on a real time stream and needs to provide a fast response for every URL that is submitted for analysis (in our context a fast response means less than 300-400 milliseconds/URL). From a malware creator point of view, it is really easy to change such URLs multiple times in one day. As a general observation, malicious URLs tend to have a short life (they appear, serve malicious content for several hours and then they are shut down usually by the ISP where they reside in). This paper aims to present a system that analyzes URLs in network traffic that is also capable of adjusting its detection models to adapt to new malicious content. Every correctly classified URL is reused as part of a new dataset that acts as the backbone for new detection models. The system also uses different clustering techniques in order to identify the lack of features on malicious URLs, thus creating a way to improve detection for this kind of threats.