Detection of DDoS attack in IoT traffic using ensemble machine learning techniques

IF 1.3 4区数学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Networks and Heterogeneous Media Pub Date : 2023-01-01 DOI:10.3934/nhm.2023061

N. Pandey, P. K. Mishra

{"title":"Detection of DDoS attack in IoT traffic using ensemble machine learning techniques","authors":"N. Pandey, P. K. Mishra","doi":"10.3934/nhm.2023061","DOIUrl":null,"url":null,"abstract":"A denial-of-service (DoS) attack aims to exhaust the resources of the victim by sending attack packets and ultimately stop the legitimate packets by various techniques. The paper discusses the consequences of distributed denial-of-service (DDoS) attacks in various application areas of Internet of Things (IoT). In this paper, we have analyzed the performance of machine learning(ML)-based classifiers including bagging and boosting techniques for the binary classification of attack traffic. For the analysis, we have used the benchmark CICDDoS2019 dataset which deals with DDoS attacks based on User Datagram Protocol (UDP) and Transmission Control Protocol (TCP) in order to study new kinds of attacks. Since these protocols are widely used for communication in IoT networks, this data has been used for studying DDoS attacks in the IoT domain. Since the data is highly unbalanced, data balancing is done using an ensemble sampling approach comprising random under-sampler and ADAptive SYNthetic (ADASYN) oversampling technique. Feature selection is achieved using two methods, i.e., (a) Pearson correlation coefficient and (b) Extra Tree classifier. Further, performance is evaluated for ML classifiers viz. Random Forest (RF), Naïve Bayes (NB), support vector machine (SVM), AdaBoost, eXtreme Gradient Boosting (XGBoost) and Gradient Boosting (GB) algorithms. It is found that RF has given the best performance with the least training and prediction time. Further, it is found that feature selection using extra trees classifier is more efficient as compared to the Pearson correlation coefficient method in terms of total time required in training and prediction for most classifiers. It is found that RF has given best performance with least time along with feature selection using Pearson correlation coefficient in attack detection.","PeriodicalId":54732,"journal":{"name":"Networks and Heterogeneous Media","volume":"1 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Networks and Heterogeneous Media","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.3934/nhm.2023061","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 2

Abstract

A denial-of-service (DoS) attack aims to exhaust the resources of the victim by sending attack packets and ultimately stop the legitimate packets by various techniques. The paper discusses the consequences of distributed denial-of-service (DDoS) attacks in various application areas of Internet of Things (IoT). In this paper, we have analyzed the performance of machine learning(ML)-based classifiers including bagging and boosting techniques for the binary classification of attack traffic. For the analysis, we have used the benchmark CICDDoS2019 dataset which deals with DDoS attacks based on User Datagram Protocol (UDP) and Transmission Control Protocol (TCP) in order to study new kinds of attacks. Since these protocols are widely used for communication in IoT networks, this data has been used for studying DDoS attacks in the IoT domain. Since the data is highly unbalanced, data balancing is done using an ensemble sampling approach comprising random under-sampler and ADAptive SYNthetic (ADASYN) oversampling technique. Feature selection is achieved using two methods, i.e., (a) Pearson correlation coefficient and (b) Extra Tree classifier. Further, performance is evaluated for ML classifiers viz. Random Forest (RF), Naïve Bayes (NB), support vector machine (SVM), AdaBoost, eXtreme Gradient Boosting (XGBoost) and Gradient Boosting (GB) algorithms. It is found that RF has given the best performance with the least training and prediction time. Further, it is found that feature selection using extra trees classifier is more efficient as compared to the Pearson correlation coefficient method in terms of total time required in training and prediction for most classifiers. It is found that RF has given best performance with least time along with feature selection using Pearson correlation coefficient in attack detection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用集成机器学习技术检测物联网流量中的DDoS攻击

DoS (denial-of-service)攻击的目的是通过发送攻击报文，耗尽被攻击对象的资源，并通过各种技术手段最终阻断合法的报文。本文讨论了分布式拒绝服务(DDoS)攻击在物联网(IoT)各个应用领域的后果。在本文中，我们分析了基于机器学习(ML)的分类器的性能，包括用于攻击流量二进制分类的bagging和boosting技术。在分析中，我们使用了CICDDoS2019基准数据集，该数据集处理基于用户数据报协议(UDP)和传输控制协议(TCP)的DDoS攻击，以研究新的攻击类型。由于这些协议广泛用于物联网网络中的通信，因此这些数据已用于研究物联网领域的DDoS攻击。由于数据高度不平衡，数据平衡使用由随机欠采样器和自适应合成(ADASYN)过采样技术组成的集成采样方法来完成。特征选择使用两种方法实现，即(a) Pearson相关系数和(b) Extra Tree分类器。此外，还评估了ML分类器的性能，即随机森林(RF)、Naïve贝叶斯(NB)、支持向量机(SVM)、AdaBoost、极端梯度增强(XGBoost)和梯度增强(GB)算法。结果表明，在训练和预测时间最少的情况下，射频算法的性能最好。此外，就大多数分类器的训练和预测所需的总时间而言，与Pearson相关系数方法相比，使用额外树分类器进行特征选择的效率更高。研究发现，结合使用Pearson相关系数的特征选择，射频在攻击检测中能够以最少的时间获得最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Networks and Heterogeneous Media 数学-数学跨学科应用

CiteScore

1.80

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： NHM offers a strong combination of three features: Interdisciplinary character, specific focus, and deep mathematical content. Also, the journal aims to create a link between the discrete and the continuous communities, which distinguishes it from other journals with strong PDE orientation. NHM publishes original contributions of high quality in networks, heterogeneous media and related fields. NHM is thus devoted to research work on complex media arising in mathematical, physical, engineering, socio-economical and bio-medical problems.