Lightweight Machine Learning Classifiers of IoT Traffic Flows

2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO) Pub Date : 2019-07-01 DOI:10.1109/SYNCHROINFO.2019.8814156

R. Bikmukhamedov, A. Nadeev

{"title":"Lightweight Machine Learning Classifiers of IoT Traffic Flows","authors":"R. Bikmukhamedov, A. Nadeev","doi":"10.1109/SYNCHROINFO.2019.8814156","DOIUrl":null,"url":null,"abstract":"IoT traffic flows have different from traditional devices statistics and their classification become an important task because of the exponentially growing number of smart devices. Conventional Deep Packet Inspection systems that rely on inspection of open fields in TLS and DNS packets, and the trend of encrypting the open fields makes machine learning based systems the only viable option for future networks. Moreover, computational complexity of models becomes crucial for large-scale operations. In this work, we investigated whether simple models, such as Logistic Regression, SVM with linear kernel, and a Decision Tree, have suitable for real-world deployments performance of multiclass classification of IoT traces, given thoughtful features engineering. We introduced a new flow feature of categorical type that describes a set of TCP-flag fields within a flow. In addition, removal of correlated features and feature space transformation via PCA method showed their usefulness in terms of prediction complexity reduction. In order to account for online classification mode, we limited the maximal number of packets within a flow to 10. Moreover, to estimate the upper-bound performance with given features, we compared the simple algorithms with Random Forest, Gradient Boosting and a feed-forward neural network. We performed 4-fold cross-validation of models by metrics Accuracy and F1-measure. The test results demonstrated that the introduced feature increases F1-measure for logistic regression from 99.1% in the base case to 99.6%, thus closely approaching more computationally expensive models. Overall, the evaluation results demonstrated feasibility of a lightweight model for IoT flow classification task with the suitable for a practical deployment performance.","PeriodicalId":363848,"journal":{"name":"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNCHROINFO.2019.8814156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

IoT traffic flows have different from traditional devices statistics and their classification become an important task because of the exponentially growing number of smart devices. Conventional Deep Packet Inspection systems that rely on inspection of open fields in TLS and DNS packets, and the trend of encrypting the open fields makes machine learning based systems the only viable option for future networks. Moreover, computational complexity of models becomes crucial for large-scale operations. In this work, we investigated whether simple models, such as Logistic Regression, SVM with linear kernel, and a Decision Tree, have suitable for real-world deployments performance of multiclass classification of IoT traces, given thoughtful features engineering. We introduced a new flow feature of categorical type that describes a set of TCP-flag fields within a flow. In addition, removal of correlated features and feature space transformation via PCA method showed their usefulness in terms of prediction complexity reduction. In order to account for online classification mode, we limited the maximal number of packets within a flow to 10. Moreover, to estimate the upper-bound performance with given features, we compared the simple algorithms with Random Forest, Gradient Boosting and a feed-forward neural network. We performed 4-fold cross-validation of models by metrics Accuracy and F1-measure. The test results demonstrated that the introduced feature increases F1-measure for logistic regression from 99.1% in the base case to 99.6%, thus closely approaching more computationally expensive models. Overall, the evaluation results demonstrated feasibility of a lightweight model for IoT flow classification task with the suitable for a practical deployment performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

物联网流量的轻量级机器学习分类器

物联网流量与传统的设备统计不同，智能设备数量呈指数级增长，物联网流量分类成为一项重要任务。传统的深度包检测系统依赖于对TLS和DNS数据包中的开放字段的检测，以及对开放字段进行加密的趋势，使得基于机器学习的系统成为未来网络唯一可行的选择。此外，模型的计算复杂度对于大规模操作至关重要。在这项工作中，我们研究了简单的模型，如逻辑回归、线性核支持向量机和决策树，在考虑到特征工程的情况下，是否适合物联网轨迹的多类分类的实际部署性能。我们引入了一个分类类型的新流特性，它描述了流中的一组tcp标志字段。此外，通过PCA方法去除相关特征和进行特征空间变换，显示了它们在降低预测复杂度方面的有效性。为了考虑在线分类模式，我们将流中的最大数据包数量限制为10。此外，为了估计给定特征下的上界性能，我们将简单算法与随机森林、梯度增强和前馈神经网络进行了比较。我们通过度量精度和F1-measure对模型进行了4次交叉验证。测试结果表明，引入的特征将逻辑回归的f1测度从基本情况下的99.1%提高到99.6%，从而接近计算成本更高的模型。总体而言，评估结果证明了轻量级模型用于物联网流分类任务的可行性，并且适合实际部署性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)

自引率

0.00%

发文量