流量超时问题:调查活动和空闲超时对机器学习模型检测安全威胁性能的影响

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-05-01 Epub Date: 2024-12-11 DOI:10.1016/j.future.2024.107641
Meryem Janati Idrissi , Hamza Alami , Abdelkader El Mahdaouy , Abdelhak Bouayad , Zakaria Yartaoui , Ismail Berrada
{"title":"流量超时问题:调查活动和空闲超时对机器学习模型检测安全威胁性能的影响","authors":"Meryem Janati Idrissi ,&nbsp;Hamza Alami ,&nbsp;Abdelkader El Mahdaouy ,&nbsp;Abdelhak Bouayad ,&nbsp;Zakaria Yartaoui ,&nbsp;Ismail Berrada","doi":"10.1016/j.future.2024.107641","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at <span><span>https://github.com/meryemJanatiIdrissi/Flow-timeout-matters</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107641"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Flow timeout matters: Investigating the impact of active and idle timeouts on the performance of machine learning models in detecting security threats\",\"authors\":\"Meryem Janati Idrissi ,&nbsp;Hamza Alami ,&nbsp;Abdelkader El Mahdaouy ,&nbsp;Abdelhak Bouayad ,&nbsp;Zakaria Yartaoui ,&nbsp;Ismail Berrada\",\"doi\":\"10.1016/j.future.2024.107641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at <span><span>https://github.com/meryemJanatiIdrissi/Flow-timeout-matters</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"166 \",\"pages\":\"Article 107641\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X24006058\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/11 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24006058","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/11 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

在高速网络和海量数据的时代,一些网络安全技术的重点正在从基于有效负载的方法转向基于流量的方法。这导致了机器学习(ML)模型在网络安全系统中的整合,其中高质量的网络流特征至关重要。然而,研究流量计量超参数(特别是空闲超时和主动超时)对ML模型性能的影响的关注有限。因此,本文旨在通过在网络入侵检测系统(NIDS)的情况下设计一系列与流特征和学习模型相关的实验来解决这一差距。我们的实验研究了空闲超时和活动超时对从网络数据中提取的特征质量的影响,以及它们对ML模型性能的后续影响。为此,我们考虑了三个用于特征提取的流导出器(NFStream、Zeek和Argus)、三个ML模型和不同的特征集。我们对包括USTC-TFC2016、CICIDS2017、UNSW-NB15和CUPID在内的公共数据集进行了广泛的实验。结果表明,最佳和最差超时组合在宏观F1-score上的差异可达8.77%。它们还揭示了不同模型和功能集对超时变化的不同敏感性。最后,提出了一种基于联邦学习的分布式学习方法。后者展示了处理具有不同超时配置的多个NIDS的潜力。代码可在https://github.com/meryemJanatiIdrissi/Flow-timeout-matters上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Flow timeout matters: Investigating the impact of active and idle timeouts on the performance of machine learning models in detecting security threats
In the era of high-speed networks and massive data, several network security technologies are shifting focus from payload-based to flow-based methods. This has led to the incorporation of Machine Learning (ML) models in network security systems, where high-quality network flow features are of paramount importance. However, limited attention has been dedicated to studying the impact of the flow metering hyperparameters, specifically idle and active timeouts, on ML models’ performance. This paper, therefore aims to address this gap by designing a series of experiments related to flow features and learning models in the case of Network Intrusion Detection Systems (NIDS). Our experiments investigate the impact idle and active timeouts have on the quality of the extracted features from network data and their subsequent impact on the performance of ML models. For this end, we consider three flow exporters for feature extraction (NFStream, Zeek, and Argus), three ML models, and different feature sets. We conducted extensive experiments with public datasets including, USTC-TFC2016, CICIDS2017, UNSW-NB15, and CUPID. The results show that the difference between best and worst timeout combinations may reach up to 8.77% in terms of macro F1-score. They also unveil varying sensitivity to changes in timeouts among different models and feature sets. Finally, we propose a distributed learning approach based on federated learning. The latter showcased potential in handling multiple NIDS with different timeout configurations. The code is available at https://github.com/meryemJanatiIdrissi/Flow-timeout-matters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
期刊最新文献
Enhanced-LLM extraction of CTI from unstructured threat reports. A tough nut to crack or a walk in the park? Dynamic and adaptive task offloading for UAV-enabled MEC systems An integrated STPA-STRIDE-BN framework for cybersecurity risk analysis: A case study of ship remote pilotage operations Weighted Federated Distillation: A knowledge-quality-aware, teacher-less strategy Energy-efficient workflow task scheduling with deadline and budget constraints on DVFS-enabled cloud systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1