Identifying relevant features of CSE-CIC-IDS2018 dataset for the development of an intrusion detection system

IF 1 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Intelligent Data Analysis Pub Date : 2024-02-21 DOI:10.3233/ida-230264

László Göcs, Zsolt Csaba Johanyák

{"title":"Identifying relevant features of CSE-CIC-IDS2018 dataset for the development of an intrusion detection system","authors":"László Göcs, Zsolt Csaba Johanyák","doi":"10.3233/ida-230264","DOIUrl":null,"url":null,"abstract":"Intrusion detection systems (IDSs) are essential elements of IT systems. Their key component is a classification module that continuously evaluates some features of the network traffic and identifies possible threats. Its efficiency is greatly affected by the right selection of the features to be monitored. Therefore, the identification of a minimal set of features that are necessary to safely distinguish malicious traffic from benign traffic is indispensable in the course of the development of an IDS. This paper presents the preprocessing and feature selection workflow as well as its results in the case of the CSE-CIC-IDS2018 on AWS dataset, focusing on five attack types. To identify the relevant features, six feature selection methods were applied, and the final ranking of the features was elaborated based on their average score. Next, several subsets of the features were formed based on different ranking threshold values, and each subset was tried with five classification algorithms to determine the optimal feature set for each attack type. During the evaluation, four widely used metrics were taken into consideration.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"2017 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-230264","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Intrusion detection systems (IDSs) are essential elements of IT systems. Their key component is a classification module that continuously evaluates some features of the network traffic and identifies possible threats. Its efficiency is greatly affected by the right selection of the features to be monitored. Therefore, the identification of a minimal set of features that are necessary to safely distinguish malicious traffic from benign traffic is indispensable in the course of the development of an IDS. This paper presents the preprocessing and feature selection workflow as well as its results in the case of the CSE-CIC-IDS2018 on AWS dataset, focusing on five attack types. To identify the relevant features, six feature selection methods were applied, and the final ranking of the features was elaborated based on their average score. Next, several subsets of the features were formed based on different ranking threshold values, and each subset was tried with five classification algorithms to determine the optimal feature set for each attack type. During the evaluation, four widely used metrics were taken into consideration.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

识别 CSE-CIC-IDS2018 数据集的相关特征以开发入侵检测系统

入侵检测系统（IDS）是 IT 系统的重要组成部分。其关键组件是一个分类模块，可持续评估网络流量的某些特征并识别可能的威胁。正确选择要监控的特征对其效率影响很大。因此，在开发 IDS 的过程中，必须确定一组最基本的特征，以便安全地区分恶意流量和良性流量。本文以 AWS 上的 CSE-CIC-IDS2018 数据集为例，介绍了预处理和特征选择工作流程及其结果，重点关注五种攻击类型。为了识别相关特征，采用了六种特征选择方法，并根据平均得分对特征进行了最终排序。接下来，根据不同的排名阈值形成了多个特征子集，并用五种分类算法对每个子集进行了尝试，以确定每种攻击类型的最佳特征集。在评估过程中，考虑了四个广泛使用的指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Intelligent Data Analysis 工程技术-计算机：人工智能

CiteScore

2.20

自引率

5.90%

发文量

审稿时长

3.3 months

期刊介绍： Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.