Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data

IF 2.6 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers Pub Date : 2023-10-11 DOI:10.3390/computers12100204
Sikha S. Bagui, Dustin Mink, Subhash C. Bagui, Sakthivel Subramaniam
{"title":"Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data","authors":"Sikha S. Bagui, Dustin Mink, Subhash C. Bagui, Sakthivel Subramaniam","doi":"10.3390/computers12100204","DOIUrl":null,"url":null,"abstract":"Machine Learning is widely used in cybersecurity for detecting network intrusions. Though network attacks are increasing steadily, the percentage of such attacks to actual network traffic is significantly less. And here lies the problem in training Machine Learning models to enable them to detect and classify malicious attacks from routine traffic. The ratio of actual attacks to benign data is significantly high and as such forms highly imbalanced datasets. In this work, we address this issue using data resampling techniques. Though there are several oversampling and undersampling techniques available, how these oversampling and undersampling techniques are most effectively used is addressed in this paper. Two oversampling techniques, Borderline SMOTE and SVM-SMOTE, are used for oversampling minority data and random undersampling is used for undersampling majority data. Both the oversampling techniques use KNN after selecting a random minority sample point, hence the impact of varying KNN values on the performance of the oversampling technique is also analyzed. Random Forest is used for classification of the rare attacks. This work is done on a widely used cybersecurity dataset, UNSW-NB15, and the results show that 10% oversampling gives better results for both BMSOTE and SVM-SMOTE.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"17 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/computers12100204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Machine Learning is widely used in cybersecurity for detecting network intrusions. Though network attacks are increasing steadily, the percentage of such attacks to actual network traffic is significantly less. And here lies the problem in training Machine Learning models to enable them to detect and classify malicious attacks from routine traffic. The ratio of actual attacks to benign data is significantly high and as such forms highly imbalanced datasets. In this work, we address this issue using data resampling techniques. Though there are several oversampling and undersampling techniques available, how these oversampling and undersampling techniques are most effectively used is addressed in this paper. Two oversampling techniques, Borderline SMOTE and SVM-SMOTE, are used for oversampling minority data and random undersampling is used for undersampling majority data. Both the oversampling techniques use KNN after selecting a random minority sample point, hence the impact of varying KNN values on the performance of the oversampling technique is also analyzed. Random Forest is used for classification of the rare attacks. This work is done on a widely used cybersecurity dataset, UNSW-NB15, and the results show that 10% oversampling gives better results for both BMSOTE and SVM-SMOTE.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用BSMOTE和SVM-SMOTE确定重采样比识别不平衡网络安全数据中的罕见攻击
机器学习被广泛应用于网络安全领域,用于检测网络入侵。尽管网络攻击正在稳步增加,但此类攻击占实际网络流量的比例明显较低。这里的问题在于训练机器学习模型,使它们能够从日常流量中检测和分类恶意攻击。实际攻击与良性数据的比例非常高,因此形成了高度不平衡的数据集。在这项工作中,我们使用数据重采样技术解决了这个问题。虽然有几种可用的过采样和欠采样技术,但如何最有效地使用这些过采样和欠采样技术是本文的重点。两种过采样技术:Borderline SMOTE和SVM-SMOTE用于过采样少数数据,随机欠采样用于欠采样多数数据。两种过采样技术都是在随机选择少数样本点后使用KNN,因此还分析了不同KNN值对过采样技术性能的影响。随机森林用于罕见攻击的分类。这项工作是在一个广泛使用的网络安全数据集UNSW-NB15上完成的,结果表明,10%的过采样对BMSOTE和SVM-SMOTE都有更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers
Computers COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
5.40
自引率
3.60%
发文量
153
审稿时长
11 weeks
期刊最新文献
Advanced Road Safety: Collective Perception for Probability of Collision Estimation of Connected Vehicles Forecasting of Bitcoin Illiquidity Using High-Dimensional and Textual Features Mining Negative Associations from Medical Databases Considering Frequent, Regular, Closed and Maximal Patterns Faraway, so Close: Perceptions of the Metaverse on the Edge of Madness Blockchain-Powered Gaming: Bridging Entertainment with Serious Game Objectives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1