{"title":"Lightweight Machine Learning Classifiers of IoT Traffic Flows","authors":"R. Bikmukhamedov, A. Nadeev","doi":"10.1109/SYNCHROINFO.2019.8814156","DOIUrl":null,"url":null,"abstract":"IoT traffic flows have different from traditional devices statistics and their classification become an important task because of the exponentially growing number of smart devices. Conventional Deep Packet Inspection systems that rely on inspection of open fields in TLS and DNS packets, and the trend of encrypting the open fields makes machine learning based systems the only viable option for future networks. Moreover, computational complexity of models becomes crucial for large-scale operations. In this work, we investigated whether simple models, such as Logistic Regression, SVM with linear kernel, and a Decision Tree, have suitable for real-world deployments performance of multiclass classification of IoT traces, given thoughtful features engineering. We introduced a new flow feature of categorical type that describes a set of TCP-flag fields within a flow. In addition, removal of correlated features and feature space transformation via PCA method showed their usefulness in terms of prediction complexity reduction. In order to account for online classification mode, we limited the maximal number of packets within a flow to 10. Moreover, to estimate the upper-bound performance with given features, we compared the simple algorithms with Random Forest, Gradient Boosting and a feed-forward neural network. We performed 4-fold cross-validation of models by metrics Accuracy and F1-measure. The test results demonstrated that the introduced feature increases F1-measure for logistic regression from 99.1% in the base case to 99.6%, thus closely approaching more computationally expensive models. Overall, the evaluation results demonstrated feasibility of a lightweight model for IoT flow classification task with the suitable for a practical deployment performance.","PeriodicalId":363848,"journal":{"name":"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNCHROINFO.2019.8814156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
IoT traffic flows have different from traditional devices statistics and their classification become an important task because of the exponentially growing number of smart devices. Conventional Deep Packet Inspection systems that rely on inspection of open fields in TLS and DNS packets, and the trend of encrypting the open fields makes machine learning based systems the only viable option for future networks. Moreover, computational complexity of models becomes crucial for large-scale operations. In this work, we investigated whether simple models, such as Logistic Regression, SVM with linear kernel, and a Decision Tree, have suitable for real-world deployments performance of multiclass classification of IoT traces, given thoughtful features engineering. We introduced a new flow feature of categorical type that describes a set of TCP-flag fields within a flow. In addition, removal of correlated features and feature space transformation via PCA method showed their usefulness in terms of prediction complexity reduction. In order to account for online classification mode, we limited the maximal number of packets within a flow to 10. Moreover, to estimate the upper-bound performance with given features, we compared the simple algorithms with Random Forest, Gradient Boosting and a feed-forward neural network. We performed 4-fold cross-validation of models by metrics Accuracy and F1-measure. The test results demonstrated that the introduced feature increases F1-measure for logistic regression from 99.1% in the base case to 99.6%, thus closely approaching more computationally expensive models. Overall, the evaluation results demonstrated feasibility of a lightweight model for IoT flow classification task with the suitable for a practical deployment performance.