MIM: A multiple integration model for intrusion detection on imbalanced samples

Zhiqiang Zhang, Le Wang, Junyi Zhu, Dong Zhu, Zhaoquan Gu, Yanchun Zhang
{"title":"MIM: A multiple integration model for intrusion detection on imbalanced samples","authors":"Zhiqiang Zhang, Le Wang, Junyi Zhu, Dong Zhu, Zhaoquan Gu, Yanchun Zhang","doi":"10.1007/s11280-024-01285-0","DOIUrl":null,"url":null,"abstract":"<p>The quantity of normal samples is commonly significantly greater than that of malicious samples, resulting in an imbalance in network security data. When dealing with imbalanced samples, the classification model requires careful sampling and attribute selection methods to cope with bias towards majority classes. Simple data sampling methods and incomplete feature selection techniques cannot improve the accuracy of intrusion detection models. In addition, a single intrusion detection model cannot accurately classify all attack types in the face of massive imbalanced security data. Nevertheless, the existing model integration methods based on stacking or voting technologies suffer from high coupling that undermines their stability and reliability. To address these issues, we propose a Multiple Integration Model (MIM) to implement feature selection and attack classification. First, MIM uses random Oversampling, random Undersampling and Washing Methods (OUWM) to reconstruct the data. Then, a modified simulated annealing algorithm is employed to generate candidate features. Finally, an integrated model based on Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost) and gradient Boosting with Categorical features support (CatBoost) is designed to achieve intrusion detection and attack classification. MIM leverages a Rule-based and Priority-based Ensemble Strategy (RPES) to combine the high accuracy of the former and the high effectiveness of the latter two, improving the stability and reliability of the integration model. We evaluate the effectiveness of our approach on two publicly available intrusion detection datasets, as well as a dataset created by researchers from the University of New Brunswick and another dataset collected by the Australian Center for Cyber Security. In our experiments, MIM significantly outperforms several existing intrusion detection models in terms of accuracy. Specifically, compared to two recently proposed methods, namely, the reinforcement learning method based on the adaptive sample distribution dual-experience replay pool mechanism (ASD2ER) and the method that combines Auto Encoder, Principal Component Analysis, and Long Short-Term Memory (AE+PCA+LSTM), MIM exhibited a respective enhancement in intrusion detection accuracy by 1.35% and 1.16%.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11280-024-01285-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The quantity of normal samples is commonly significantly greater than that of malicious samples, resulting in an imbalance in network security data. When dealing with imbalanced samples, the classification model requires careful sampling and attribute selection methods to cope with bias towards majority classes. Simple data sampling methods and incomplete feature selection techniques cannot improve the accuracy of intrusion detection models. In addition, a single intrusion detection model cannot accurately classify all attack types in the face of massive imbalanced security data. Nevertheless, the existing model integration methods based on stacking or voting technologies suffer from high coupling that undermines their stability and reliability. To address these issues, we propose a Multiple Integration Model (MIM) to implement feature selection and attack classification. First, MIM uses random Oversampling, random Undersampling and Washing Methods (OUWM) to reconstruct the data. Then, a modified simulated annealing algorithm is employed to generate candidate features. Finally, an integrated model based on Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost) and gradient Boosting with Categorical features support (CatBoost) is designed to achieve intrusion detection and attack classification. MIM leverages a Rule-based and Priority-based Ensemble Strategy (RPES) to combine the high accuracy of the former and the high effectiveness of the latter two, improving the stability and reliability of the integration model. We evaluate the effectiveness of our approach on two publicly available intrusion detection datasets, as well as a dataset created by researchers from the University of New Brunswick and another dataset collected by the Australian Center for Cyber Security. In our experiments, MIM significantly outperforms several existing intrusion detection models in terms of accuracy. Specifically, compared to two recently proposed methods, namely, the reinforcement learning method based on the adaptive sample distribution dual-experience replay pool mechanism (ASD2ER) and the method that combines Auto Encoder, Principal Component Analysis, and Long Short-Term Memory (AE+PCA+LSTM), MIM exhibited a respective enhancement in intrusion detection accuracy by 1.35% and 1.16%.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MIM:用于不平衡样本入侵检测的多重集成模型
正常样本的数量通常远远大于恶意样本的数量,从而导致网络安全数据的不平衡。在处理不平衡样本时,分类模型需要谨慎的采样和属性选择方法,以应对偏向多数类别的情况。简单的数据采样方法和不完整的特征选择技术无法提高入侵检测模型的准确性。此外,面对大量不平衡的安全数据,单一的入侵检测模型无法准确地对所有攻击类型进行分类。然而,现有的基于堆叠或投票技术的模型集成方法存在耦合度高的问题,影响了其稳定性和可靠性。为了解决这些问题,我们提出了一种多重集成模型(MIM)来实现特征选择和攻击分类。首先,MIM 使用随机过采样、随机欠采样和清洗方法(OUWM)来重建数据。然后,采用改进的模拟退火算法生成候选特征。最后,设计了一个基于轻梯度提升机(LightGBM)、极端梯度提升(XGBoost)和支持分类特征的梯度提升(CatBoost)的集成模型,以实现入侵检测和攻击分类。MIM 利用基于规则和优先级的集合策略 (RPES),将前者的高准确性和后者的高效性结合起来,提高了集成模型的稳定性和可靠性。我们在两个公开的入侵检测数据集、新不伦瑞克大学研究人员创建的数据集和澳大利亚网络安全中心收集的另一个数据集上评估了我们方法的有效性。在我们的实验中,MIM 在准确性方面明显优于现有的几种入侵检测模型。具体来说,与最近提出的两种方法(即基于自适应样本分布双经验重放池机制的强化学习方法(ASD2ER)和结合了自动编码器、主成分分析和长短期记忆(AE+PCA+LSTM)的方法)相比,MIM 的入侵检测准确率分别提高了 1.35% 和 1.16%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HetFS: a method for fast similarity search with ad-hoc meta-paths on heterogeneous information networks A SHAP-based controversy analysis through communities on Twitter pFind: Privacy-preserving lost object finding in vehicular crowdsensing Use of prompt-based learning for code-mixed and code-switched text classification Drug traceability system based on semantic blockchain and on a reputation method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1