{"title":"Imbalanced Internet Traffic Classification Using Ensemble Framework","authors":"Phuylai Oeung, Fuke Shen","doi":"10.1109/ICOIN.2019.8717977","DOIUrl":null,"url":null,"abstract":"Machine learning (ML)-based traffic classification has been gaining increasing importance due to the declining port-based and payload-based approaches. However, the packet-level method is the most common measure used in the previous works, which requires additional hardware device to manage and monitor, increasing cost as well as extra works of the network personnel. In this paper, we propose a methodology to build an efficient classifier from NetFlow which is the widely applied monitoring solution among network operators in the form of flow-level. First, we analyze the per-application performance through the C4.5 decision tree with the features derived from the NetFlow records. The result shows that the accuracy obtained is as good as the packet-level method. We further propose the ensemble feature selection (FS) method to improve the classification accuracy and to reduce the computational complexity. Lastly, we present the clustering-based under-sampling combining with synthetic minority over-sampling technique (SMOTE) approach to solving the problem of the concept drift and the imbalanced dataset, and we extract insights and recommendation for practical application. With the combination of proposed methods, the experiment result reports high F-measure over two traces containing the wide range of applications with low computational complexity.","PeriodicalId":422041,"journal":{"name":"2019 International Conference on Information Networking (ICOIN)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Information Networking (ICOIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIN.2019.8717977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Machine learning (ML)-based traffic classification has been gaining increasing importance due to the declining port-based and payload-based approaches. However, the packet-level method is the most common measure used in the previous works, which requires additional hardware device to manage and monitor, increasing cost as well as extra works of the network personnel. In this paper, we propose a methodology to build an efficient classifier from NetFlow which is the widely applied monitoring solution among network operators in the form of flow-level. First, we analyze the per-application performance through the C4.5 decision tree with the features derived from the NetFlow records. The result shows that the accuracy obtained is as good as the packet-level method. We further propose the ensemble feature selection (FS) method to improve the classification accuracy and to reduce the computational complexity. Lastly, we present the clustering-based under-sampling combining with synthetic minority over-sampling technique (SMOTE) approach to solving the problem of the concept drift and the imbalanced dataset, and we extract insights and recommendation for practical application. With the combination of proposed methods, the experiment result reports high F-measure over two traces containing the wide range of applications with low computational complexity.