A Novel Deep Clustering Variational Auto-Encoder for Anomaly-based Network Intrusion Detection

2022 14th International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2022-10-19 DOI:10.1109/KSE56063.2022.9953763

Van Quan Nguyen, V. H. Nguyen, T. Hoang, Nathan Shone

{"title":"A Novel Deep Clustering Variational Auto-Encoder for Anomaly-based Network Intrusion Detection","authors":"Van Quan Nguyen, V. H. Nguyen, T. Hoang, Nathan Shone","doi":"10.1109/KSE56063.2022.9953763","DOIUrl":null,"url":null,"abstract":"The role of semi-supervised network intrusion detection systems is becoming increasingly important in the ever-changing digital landscape. Despite the boom in commercial and research interest, there are still many concerns over accuracy yet to be addressed. Two of the major limitations contributing to this concern are reliably learning the underlying probability distribution of normal network data and the identification of the boundary between the normal and anomalous data regions in the latent space. Recent research has proposed many different ways to learn the latent representation of normal data in a semi-supervised manner, such as using Clustering-based Autoencoder (CAE) and hybridized approaches of Principal Component Analysis (PCA) and CAE. However, such approaches are still affected by these limitations, predominantly due to an overreliance on feature engineering, or the inability to handle the large data dimensionality. In this paper, we propose a novel Cluster Variational Autoencoder (CVAE) deep learning model to overcome the aforementioned limitations and increase the efficiency of network intrusion detection. This enables a more concise and dominant representation of the latent space to be learnt. The probability distribution learning capabilities of the VAE are fully exploited to learn the underlying probability distribution of the normal network data. This combination enables us to address the limitations discussed. The performance of the proposed model is evaluated using eight benchmark network intrusion datasets: NSL-KDD, UNSW-NB15, CICIDS2017 and five scenarios from CTU13 (CTU13-08, CTU-13-09, CTU13-10, CTU13-12 and CTU13-13). The experimental results achieved clearly demonstrate that the proposed method outperforms semi-supervised approaches from existing works.","PeriodicalId":330865,"journal":{"name":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE56063.2022.9953763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The role of semi-supervised network intrusion detection systems is becoming increasingly important in the ever-changing digital landscape. Despite the boom in commercial and research interest, there are still many concerns over accuracy yet to be addressed. Two of the major limitations contributing to this concern are reliably learning the underlying probability distribution of normal network data and the identification of the boundary between the normal and anomalous data regions in the latent space. Recent research has proposed many different ways to learn the latent representation of normal data in a semi-supervised manner, such as using Clustering-based Autoencoder (CAE) and hybridized approaches of Principal Component Analysis (PCA) and CAE. However, such approaches are still affected by these limitations, predominantly due to an overreliance on feature engineering, or the inability to handle the large data dimensionality. In this paper, we propose a novel Cluster Variational Autoencoder (CVAE) deep learning model to overcome the aforementioned limitations and increase the efficiency of network intrusion detection. This enables a more concise and dominant representation of the latent space to be learnt. The probability distribution learning capabilities of the VAE are fully exploited to learn the underlying probability distribution of the normal network data. This combination enables us to address the limitations discussed. The performance of the proposed model is evaluated using eight benchmark network intrusion datasets: NSL-KDD, UNSW-NB15, CICIDS2017 and five scenarios from CTU13 (CTU13-08, CTU-13-09, CTU13-10, CTU13-12 and CTU13-13). The experimental results achieved clearly demonstrate that the proposed method outperforms semi-supervised approaches from existing works.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种新的基于异常的网络入侵检测深度聚类变分自编码器

在不断变化的数字环境中，半监督网络入侵检测系统的作用变得越来越重要。尽管商业和研究兴趣蓬勃发展，但仍有许多关于准确性的担忧有待解决。造成这一问题的两个主要限制是可靠地学习正常网络数据的潜在概率分布，以及识别潜在空间中正常和异常数据区域之间的边界。最近的研究提出了许多以半监督方式学习正常数据潜在表示的方法，如基于聚类的自编码器(CAE)和主成分分析(PCA)和CAE的混合方法。然而，这些方法仍然受到这些限制的影响，主要是由于过度依赖特征工程，或者无法处理大数据维度。本文提出了一种新的聚类变分自编码器(CVAE)深度学习模型来克服上述局限性，提高网络入侵检测的效率。这使得学习潜在空间的更简洁和主导的表示成为可能。充分利用VAE的概率分布学习能力来学习正常网络数据的底层概率分布。这种组合使我们能够解决所讨论的限制。使用8个基准网络入侵数据集(NSL-KDD、UNSW-NB15、CICIDS2017)和CTU13的5个场景(CTU13-08、CTU13- 09、CTU13-10、CTU13-12和CTU13-13)对该模型的性能进行了评估。实验结果清楚地表明，该方法优于现有的半监督方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 14th International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量

期刊最新文献

DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples Polygenic risk scores adaptation for Height in a Vietnamese population Sentiment Classification for Beauty-fashion Reviews An Automated Stub Method for Unit Testing C/C++ Projects Knowledge-based Problem Solving and Reasoning methods