Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada
{"title":"流量超时问题:调查活动和空闲超时对机器学习模型检测安全威胁性能的影响","authors":"Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada","doi":"10.1016/j.future.2024.107641","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at <span><span>https://github.com/meryemJanatiIdrissi/Flow-timeout-matters</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107641"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flow timeout matters: Investigating the impact of active and idle timeouts on the performance of machine learning models in detecting security threats\",\"authors\":\"Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada\",\"doi\":\"10.1016/j.future.2024.107641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at <span><span>https://github.com/meryemJanatiIdrissi/Flow-timeout-matters</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"166 \",\"pages\":\"Article 107641\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X24006058\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24006058","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Flow timeout matters: Investigating the impact of active and idle timeouts on the performance of machine learning models in detecting security threats
In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at https://github.com/meryemJanatiIdrissi/Flow-timeout-matters.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.