{"title":"在入侵检测系统中实现半监督学习","authors":"Panagis Sarantos , John Violos , Aris Leivadeas","doi":"10.1016/j.jpdc.2024.105010","DOIUrl":null,"url":null,"abstract":"<div><div>Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"196 ","pages":"Article 105010"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enabling semi-supervised learning in intrusion detection systems\",\"authors\":\"Panagis Sarantos , John Violos , Aris Leivadeas\",\"doi\":\"10.1016/j.jpdc.2024.105010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.</div></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"196 \",\"pages\":\"Article 105010\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001746\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001746","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Enabling semi-supervised learning in intrusion detection systems
Intrusion Detection systems (IDS) are alerting cybersecurity tools that analyze network traffic in order to identify suspicious activity and known threats. State of the art IDS rely on supervised machine learning models which are trained to categorize the network flow with a historical labeled dataset. Nonetheless, next-generation networks are characterized as heterogeneous and dynamic. The heterogeneity can make every network environment to be significantly different and the dynamicity means that new threats are constantly emerging. These two factors raise the research question if a supervised machine learning based IDS can work efficiently in a network environment different from the one that generated its labeled training data. In this paper, we first give an answer to this research question and next try to propose a semi-supervised learning approach that can be generalized sufficiently in a different network environment using unlabeled data, taking into consideration that unlabeled data are much easier and cheap to be collected compared to labeled ones. In order to have a proof of concept we made experiments with two labeled datasets CIC-IDS2017, CIC-IDS2018 which are publicly available and one unlabeled dataset PS-Azure2023 which we constructed for this work and make it also publicly available. The results confirm our assumption and the applicability of the semi-supervised learning paradigm for the design of IDS.
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.