流量超时问题：调查活动和空闲超时对机器学习模型检测安全威胁性能的影响

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-05-01 Epub Date: 2024-12-11 DOI:10.1016/j.future.2024.107641

Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada

{"title":"流量超时问题：调查活动和空闲超时对机器学习模型检测安全威胁性能的影响","authors":"Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada","doi":"10.1016/j.future.2024.107641","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at <span><span>https://github.com/meryemJanatiIdrissi/Flow-timeout-matters</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107641"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flow timeout matters: Investigating the impact of active and idle timeouts on the performance of machine learning models in detecting security threats\",\"authors\":\"Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada\",\"doi\":\"10.1016/j.future.2024.107641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at <span><span>https://github.com/meryemJanatiIdrissi/Flow-timeout-matters</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"166 \",\"pages\":\"Article 107641\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X24006058\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24006058","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

在高速网络和海量数据的时代，一些网络安全技术的重点正在从基于有效负载的方法转向基于流量的方法。这导致了机器学习（ML）模型在网络安全系统中的整合，其中高质量的网络流特征至关重要。然而，研究流量计量超参数（特别是空闲超时和主动超时）对ML模型性能的影响的关注有限。因此，本文旨在通过在网络入侵检测系统（NIDS）的情况下设计一系列与流特征和学习模型相关的实验来解决这一差距。我们的实验研究了空闲超时和活动超时对从网络数据中提取的特征质量的影响，以及它们对ML模型性能的后续影响。为此，我们考虑了三个用于特征提取的流导出器（NFStream、Zeek和Argus）、三个ML模型和不同的特征集。我们对包括USTC-TFC2016、CICIDS2017、UNSW-NB15和CUPID在内的公共数据集进行了广泛的实验。结果表明，最佳和最差超时组合在宏观F1-score上的差异可达8.77%。它们还揭示了不同模型和功能集对超时变化的不同敏感性。最后，提出了一种基于联邦学习的分布式学习方法。后者展示了处理具有不同超时配置的多个NIDS的潜力。代码可在https://github.com/meryemJanatiIdrissi/Flow-timeout-matters上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Flow timeout matters: Investigating the impact of active and idle timeouts on the performance of machine learning models in detecting security threats

In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at https://github.com/meryemJanatiIdrissi/Flow-timeout-matters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.