Enabling semi-supervised learning in intrusion detection systems

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-11-12 DOI:10.1016/j.jpdc.2024.105010

Panagis Sarantos , John Violos , Aris Leivadeas

{"title":"Enabling semi-supervised learning in intrusion detection systems","authors":"Panagis Sarantos , John Violos , Aris Leivadeas","doi":"10.1016/j.jpdc.2024.105010","DOIUrl":null,"url":null,"abstract":"<div><div>Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 105010"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001746","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在入侵检测系统中实现半监督学习

入侵检测系统（IDS）是一种警报网络安全工具，它通过分析网络流量来识别可疑活动和已知威胁。最先进的入侵检测系统依赖于有监督的机器学习模型，这些模型经过训练，能利用历史标注数据集对网络流量进行分类。然而，下一代网络具有异构和动态的特点。异构性使每个网络环境都大不相同，而动态性则意味着新的威胁不断出现。这两个因素提出了一个研究问题，即基于监督机器学习的 IDS 能否在不同于产生其标注训练数据的网络环境中有效工作。在本文中，我们首先给出了这一研究问题的答案，然后尝试提出一种半监督学习方法，这种方法可以在不同的网络环境中使用无标记数据进行充分推广，同时考虑到与有标记数据相比，无标记数据更容易收集且成本更低。为了验证这一概念，我们使用两个公开的标注数据集 CIC-IDS2017 和 CIC-IDS2018 以及一个非标注数据集 PS-Azure2023 进行了实验。结果证实了我们的假设以及半监督学习范式在 IDS 设计中的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.