在入侵检测系统中实现半监督学习

IF 3.4 3区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Journal of Parallel and Distributed Computing Pub Date : 2024-11-12 DOI:10.1016/j.jpdc.2024.105010
Panagis Sarantos , John Violos , Aris Leivadeas
{"title":"在入侵检测系统中实现半监督学习","authors":"Panagis Sarantos ,&nbsp;John Violos ,&nbsp;Aris Leivadeas","doi":"10.1016/j.jpdc.2024.105010","DOIUrl":null,"url":null,"abstract":"<div><div>Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 105010"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enabling semi-supervised learning in intrusion detection systems\",\"authors\":\"Panagis Sarantos ,&nbsp;John Violos ,&nbsp;Aris Leivadeas\",\"doi\":\"10.1016/j.jpdc.2024.105010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.</div></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"196 \",\"pages\":\"Article 105010\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001746\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001746","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

入侵检测系统(IDS)是一种警报网络安全工具,它通过分析网络流量来识别可疑活动和已知威胁。最先进的入侵检测系统依赖于有监督的机器学习模型,这些模型经过训练,能利用历史标注数据集对网络流量进行分类。然而,下一代网络具有异构和动态的特点。异构性使每个网络环境都大不相同,而动态性则意味着新的威胁不断出现。这两个因素提出了一个研究问题,即基于监督机器学习的 IDS 能否在不同于产生其标注训练数据的网络环境中有效工作。在本文中,我们首先给出了这一研究问题的答案,然后尝试提出一种半监督学习方法,这种方法可以在不同的网络环境中使用无标记数据进行充分推广,同时考虑到与有标记数据相比,无标记数据更容易收集且成本更低。为了验证这一概念,我们使用两个公开的标注数据集 CIC-IDS2017 和 CIC-IDS2018 以及一个非标注数据集 PS-Azure2023 进行了实验。结果证实了我们的假设以及半监督学习范式在 IDS 设计中的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enabling semi-supervised learning in intrusion detection systems
Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing 工程技术-计算机:理论方法
CiteScore
10.30
自引率
2.60%
发文量
172
审稿时长
12 months
期刊介绍: This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.
期刊最新文献
Enabling semi-supervised learning in intrusion detection systems Fault-tolerance in biswapped multiprocessor interconnection networks Editorial Board Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) Design and experimental evaluation of algorithms for optimizing the throughput of dispersed computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1