{"title":"Effectiveness of an ensemble technique based on the distributivity equation in detecting suspicious network activity","authors":"Ewa Rak , Jaromir Sarzyński , Rafał Rak","doi":"10.1016/j.fss.2024.109015","DOIUrl":null,"url":null,"abstract":"<div><p>With the growing complexity and frequency of cyber threats, there is a pressing need for more effective defense mechanisms. Machine learning offers the potential to analyze vast amounts of data and identify patterns indicative of malicious activity, enabling faster and more accurate threat detection. Ensemble methods, by incorporating diverse models with varying vulnerabilities, can increase resilience against adversarial attacks. This study covers the usage and evaluation of the relevance of an innovative approach of ensemble classification for identifying intrusion threats on a large CICIDS2017 dataset. The approach is based on the distributivity equation that appropriately aggregates the underlying classifiers. It combines various standard supervised classification algorithms, including Multilayer Perceptron Network, k-Nearest Neighbors, and Naive Bayes, to create an ensemble. Experiments were conducted to evaluate the effectiveness of the proposed hybrid ensemble method. The performance of the ensemble approach was compared with individual classifiers using measures such as accuracy, precision, recall, <em>F</em>-score, and area under the ROC curve. Additionally, comparisons were made with widely used state-of-the-art ensemble models, including the soft voting method (Weighted Average Probabilities), Adaptive Boosting (AdaBoost), and Histogram-based Gradient Boosting Classification Tree (HGBC) and with existing methods in the literature using the same dataset, such as Deep Belief Networks (DBN), Deep Feature Learning via Graph (Deep GFL). Based on these experiments, it was found that some ensemble methods, such as AdaBoost and Histogram-based Gradient Classification Tree, do not perform reliably for the specific task of identifying network attacks. This highlights the importance of understanding the context and requirements of the data and problem domain. The results indicate that the proposed hybrid ensemble method outperforms traditional algorithms in terms of classification precision and accuracy, and offers insights for improving the effectiveness of intrusion detection systems.</p></div>","PeriodicalId":55130,"journal":{"name":"Fuzzy Sets and Systems","volume":"488 ","pages":"Article 109015"},"PeriodicalIF":3.2000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuzzy Sets and Systems","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165011424001611","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
With the growing complexity and frequency of cyber threats, there is a pressing need for more effective defense mechanisms. Machine learning offers the potential to analyze vast amounts of data and identify patterns indicative of malicious activity, enabling faster and more accurate threat detection. Ensemble methods, by incorporating diverse models with varying vulnerabilities, can increase resilience against adversarial attacks. This study covers the usage and evaluation of the relevance of an innovative approach of ensemble classification for identifying intrusion threats on a large CICIDS2017 dataset. The approach is based on the distributivity equation that appropriately aggregates the underlying classifiers. It combines various standard supervised classification algorithms, including Multilayer Perceptron Network, k-Nearest Neighbors, and Naive Bayes, to create an ensemble. Experiments were conducted to evaluate the effectiveness of the proposed hybrid ensemble method. The performance of the ensemble approach was compared with individual classifiers using measures such as accuracy, precision, recall, F-score, and area under the ROC curve. Additionally, comparisons were made with widely used state-of-the-art ensemble models, including the soft voting method (Weighted Average Probabilities), Adaptive Boosting (AdaBoost), and Histogram-based Gradient Boosting Classification Tree (HGBC) and with existing methods in the literature using the same dataset, such as Deep Belief Networks (DBN), Deep Feature Learning via Graph (Deep GFL). Based on these experiments, it was found that some ensemble methods, such as AdaBoost and Histogram-based Gradient Classification Tree, do not perform reliably for the specific task of identifying network attacks. This highlights the importance of understanding the context and requirements of the data and problem domain. The results indicate that the proposed hybrid ensemble method outperforms traditional algorithms in terms of classification precision and accuracy, and offers insights for improving the effectiveness of intrusion detection systems.
期刊介绍:
Since its launching in 1978, the journal Fuzzy Sets and Systems has been devoted to the international advancement of the theory and application of fuzzy sets and systems. The theory of fuzzy sets now encompasses a well organized corpus of basic notions including (and not restricted to) aggregation operations, a generalized theory of relations, specific measures of information content, a calculus of fuzzy numbers. Fuzzy sets are also the cornerstone of a non-additive uncertainty theory, namely possibility theory, and of a versatile tool for both linguistic and numerical modeling: fuzzy rule-based systems. Numerous works now combine fuzzy concepts with other scientific disciplines as well as modern technologies.
In mathematics fuzzy sets have triggered new research topics in connection with category theory, topology, algebra, analysis. Fuzzy sets are also part of a recent trend in the study of generalized measures and integrals, and are combined with statistical methods. Furthermore, fuzzy sets have strong logical underpinnings in the tradition of many-valued logics.