Mahjabeen Tahir , Azizol Abdullah , Nur Izura Udzir , Khairul Azhar Kasmiran
{"title":"A novel approach for handling missing data to enhance network intrusion detection system","authors":"Mahjabeen Tahir , Azizol Abdullah , Nur Izura Udzir , Khairul Azhar Kasmiran","doi":"10.1016/j.csa.2024.100063","DOIUrl":null,"url":null,"abstract":"<div><p>Managing missing data is a critical challenge in Intrusion Detection System (IDS) datasets, significantly affecting the performance of deep learning models. To address this issue, we introduce DeepLearning_Based_MissingData_Imputation (DMDI), a novel method designed to enhance the quality of input data by efficiently handling missing values. Our approach employs the Random Missing Value (RMV) algorithm to simulate missing data, enabling thorough testing and comparison of various imputation techniques. The DMDI method integrates a stacked denoising autoencoder with Gradient Boosting to improve imputation accuracy. We evaluated the effectiveness of our approach through three experimental phases: generating missing data, imputing missing values, and assessing imputation models. Using the NSL-KDD and UNSW-NB15 datasets, our results demonstrate significant improvements in the performance of five different classifiers (SVM, KNN, Logistic Regression, Decision Tree, and Random Forest) after imputation. On average, our method achieved accuracy improvements ranging from 0.95 to 0.97 across these classifiers compared to baseline imputation methods. Detailed analysis using Python 3 validates our findings, demonstrating enhanced model performance and robustness. This study underscores the necessity of precise missing data imputation for enhancing deep learning tasks, particularly in anomaly detection systems. It provides a reliable solution for managing missing data in IDS datasets.</p></div>","PeriodicalId":100351,"journal":{"name":"Cyber Security and Applications","volume":"3 ","pages":"Article 100063"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772918424000298/pdfft?md5=09e2857fa1c1aad5fc434b6f9663038b&pid=1-s2.0-S2772918424000298-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cyber Security and Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772918424000298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Managing missing data is a critical challenge in Intrusion Detection System (IDS) datasets, significantly affecting the performance of deep learning models. To address this issue, we introduce DeepLearning_Based_MissingData_Imputation (DMDI), a novel method designed to enhance the quality of input data by efficiently handling missing values. Our approach employs the Random Missing Value (RMV) algorithm to simulate missing data, enabling thorough testing and comparison of various imputation techniques. The DMDI method integrates a stacked denoising autoencoder with Gradient Boosting to improve imputation accuracy. We evaluated the effectiveness of our approach through three experimental phases: generating missing data, imputing missing values, and assessing imputation models. Using the NSL-KDD and UNSW-NB15 datasets, our results demonstrate significant improvements in the performance of five different classifiers (SVM, KNN, Logistic Regression, Decision Tree, and Random Forest) after imputation. On average, our method achieved accuracy improvements ranging from 0.95 to 0.97 across these classifiers compared to baseline imputation methods. Detailed analysis using Python 3 validates our findings, demonstrating enhanced model performance and robustness. This study underscores the necessity of precise missing data imputation for enhancing deep learning tasks, particularly in anomaly detection systems. It provides a reliable solution for managing missing data in IDS datasets.