{"title":"A Scalable and Stacked Ensemble Approach to Improve Intrusion Detection in Clouds","authors":"Mohd. Rehan Ghazi, N. S. Raghava","doi":"10.5755/j01.itc.52.4.32042","DOIUrl":null,"url":null,"abstract":"The availability of automated data collection techniques and the growth in the amount of data collected from cloud network traffic and cloud resource activities has transformed into a big data challenge, compelling the engagement of big data tools to handle, manage, and interpret it. A single classification method may fail to execute successfully for the amount of acquired data. Despite being more complex and consuming more computational resources, the research shows that stacking-based ensemble Machine Learning (ML) methodologies perform better in data classification approaches than single classifiers. This research proposes Intrusion Detection Systems (IDS), both based on the ensemble of ML algorithms built on the Stacked Generalization Approach (SGA) and big data technology. The suggested approaches are tested and assessed on NSL-KDD and UNSW-NB15 datasets, utilizing a Gain Ration (GR) based Feature Selection (FS) approach, J48, OneR, Support Vector Machine (SVM), Random Forest (RF), Multi- layer Perceptron (MLP) and Extreme Gradient Boosting (XGBoost) classifiers and Apache Spark, a prominent big data processing platform. The first technique involves storing data on HDFS, while the second involves selecting the most suitable subset of base classifiers for stacking. A thorough performance investigation reveals that our proposed model outperforms other current IDS models either in terms of accuracy or FPR or other performance metrics, in discovering intrusions for the Cloud.","PeriodicalId":54982,"journal":{"name":"Information Technology and Control","volume":"3 4","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Control","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.5755/j01.itc.52.4.32042","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The availability of automated data collection techniques and the growth in the amount of data collected from cloud network traffic and cloud resource activities has transformed into a big data challenge, compelling the engagement of big data tools to handle, manage, and interpret it. A single classification method may fail to execute successfully for the amount of acquired data. Despite being more complex and consuming more computational resources, the research shows that stacking-based ensemble Machine Learning (ML) methodologies perform better in data classification approaches than single classifiers. This research proposes Intrusion Detection Systems (IDS), both based on the ensemble of ML algorithms built on the Stacked Generalization Approach (SGA) and big data technology. The suggested approaches are tested and assessed on NSL-KDD and UNSW-NB15 datasets, utilizing a Gain Ration (GR) based Feature Selection (FS) approach, J48, OneR, Support Vector Machine (SVM), Random Forest (RF), Multi- layer Perceptron (MLP) and Extreme Gradient Boosting (XGBoost) classifiers and Apache Spark, a prominent big data processing platform. The first technique involves storing data on HDFS, while the second involves selecting the most suitable subset of base classifiers for stacking. A thorough performance investigation reveals that our proposed model outperforms other current IDS models either in terms of accuracy or FPR or other performance metrics, in discovering intrusions for the Cloud.
期刊介绍:
Periodical journal covers a wide field of computer science and control systems related problems including:
-Software and hardware engineering;
-Management systems engineering;
-Information systems and databases;
-Embedded systems;
-Physical systems modelling and application;
-Computer networks and cloud computing;
-Data visualization;
-Human-computer interface;
-Computer graphics, visual analytics, and multimedia systems.