{"title":"一种混合启发式统计点对点流量分类器","authors":"M. M. Hassan, M. N. Marsono","doi":"10.1109/ICCSII.2012.6454475","DOIUrl":null,"url":null,"abstract":"Peer-to-peer (P2P) traffic consumes a significant chunk of Internet bandwidth that requires effective control. This work proposes a novel hybrid heuristics-statistical approach to classify P2P traffic. Heuristics approach provides highly accurate P2P detection, although it involves measuring and analyzing of many correlations between packets and flows for certain duration of time, which make it inapplicable for online P2P traffic classification. On the other hand, statistical classification can classify traffic in an online manner although it needs periodical, often manual, retraining. The proposed hybrid solution merges these two approaches: offline heuristics learning corpus generation and online statistical classification. In the first part, heuristics are used to classify traffic flows into three classes, two which are later used for training the online statistical classifier. This work presents an enhancement on the existing heuristics P2P classification by adding a new class for unknown traffic. Analyses on the offline traces using the improved heuristics show that the addition of the third class reduces the class noise from 7% to 2%, hence, providing quality examples to retrain the online statistical classifier. For the second part, machine learning (ML) algorithms are used to classify traffic on the fly based on the flows and packets statistics. Using examples generated by the heuristics classifier, the overall statistical classification accuracy is 99% based on analysis on downloaded and captured traces.","PeriodicalId":281140,"journal":{"name":"2012 International Conference on Computer Systems and Industrial Informatics","volume":"453 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A hybrid heuristics-statistical peer-to-peer traffic classifier\",\"authors\":\"M. M. Hassan, M. N. Marsono\",\"doi\":\"10.1109/ICCSII.2012.6454475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Peer-to-peer (P2P) traffic consumes a significant chunk of Internet bandwidth that requires effective control. This work proposes a novel hybrid heuristics-statistical approach to classify P2P traffic. Heuristics approach provides highly accurate P2P detection, although it involves measuring and analyzing of many correlations between packets and flows for certain duration of time, which make it inapplicable for online P2P traffic classification. On the other hand, statistical classification can classify traffic in an online manner although it needs periodical, often manual, retraining. The proposed hybrid solution merges these two approaches: offline heuristics learning corpus generation and online statistical classification. In the first part, heuristics are used to classify traffic flows into three classes, two which are later used for training the online statistical classifier. This work presents an enhancement on the existing heuristics P2P classification by adding a new class for unknown traffic. Analyses on the offline traces using the improved heuristics show that the addition of the third class reduces the class noise from 7% to 2%, hence, providing quality examples to retrain the online statistical classifier. For the second part, machine learning (ML) algorithms are used to classify traffic on the fly based on the flows and packets statistics. Using examples generated by the heuristics classifier, the overall statistical classification accuracy is 99% based on analysis on downloaded and captured traces.\",\"PeriodicalId\":281140,\"journal\":{\"name\":\"2012 International Conference on Computer Systems and Industrial Informatics\",\"volume\":\"453 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Computer Systems and Industrial Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSII.2012.6454475\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Computer Systems and Industrial Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSII.2012.6454475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A hybrid heuristics-statistical peer-to-peer traffic classifier
Peer-to-peer (P2P) traffic consumes a significant chunk of Internet bandwidth that requires effective control. This work proposes a novel hybrid heuristics-statistical approach to classify P2P traffic. Heuristics approach provides highly accurate P2P detection, although it involves measuring and analyzing of many correlations between packets and flows for certain duration of time, which make it inapplicable for online P2P traffic classification. On the other hand, statistical classification can classify traffic in an online manner although it needs periodical, often manual, retraining. The proposed hybrid solution merges these two approaches: offline heuristics learning corpus generation and online statistical classification. In the first part, heuristics are used to classify traffic flows into three classes, two which are later used for training the online statistical classifier. This work presents an enhancement on the existing heuristics P2P classification by adding a new class for unknown traffic. Analyses on the offline traces using the improved heuristics show that the addition of the third class reduces the class noise from 7% to 2%, hence, providing quality examples to retrain the online statistical classifier. For the second part, machine learning (ML) algorithms are used to classify traffic on the fly based on the flows and packets statistics. Using examples generated by the heuristics classifier, the overall statistical classification accuracy is 99% based on analysis on downloaded and captured traces.