Development of Various Stacking Ensemble-Based HIDS Using ADFA Datasets

IF 6.3 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Open Journal of the Communications Society Pub Date : 2025-02-03 DOI:10.1109/OJCOMS.2025.3538101
Hami Satilmiş;Sedat Akleylek;Zaliha Yüce Tok
{"title":"Development of Various Stacking Ensemble-Based HIDS Using ADFA Datasets","authors":"Hami Satilmiş;Sedat Akleylek;Zaliha Yüce Tok","doi":"10.1109/OJCOMS.2025.3538101","DOIUrl":null,"url":null,"abstract":"The rapid increase in the number of cyber attacks and the emergence of various attack variations pose significant threats to the security of computer systems and networks. Various intrusion detection systems (IDS) are developed to defend computer systems and networks in response to these threats. One type of IDS, known as a host-based intrusion detection system (HIDS), focuses on securing a single host. Numerous HIDS have been proposed in the literature, incorporating various detection methods. This study develops multiple machine learning (ML) models and stacking ensemble based HIDS that can be used as detection methods in HIDS. Initially, n-grams, standard bag-of-words (BoW), binary BoW, probability BoW, and term frequency-inverse document frequency (TF-IDF) BoW methods are applied to the ADFA-LD and ADFA-WD datasets. Mutual information and k-means methods are used together for feature selection on the resulting BoW datasets. Individual models are created using either selected features or all features. Subsequently, the outputs of these individual models are used in extreme gradient boosting (XGBoost) and adaptive boosting (AdaBoost) models to develop stacking ensemble based models. The experimental results show that the best accuracy (ACC) among models using ADFA-LD based BoW datasets is achieved by the stacking ensemble based XGBoost model, which has an ACC of 0.9747. This XGBoost model utilizes the standard BoW dataset and selected features. Among models using ADFA-WD based BoW datasets, the stacking ensemble based XGBoost is also the most successful in terms of ACC, with an ACC of 0.9163, using the standard BoW dataset and all features.","PeriodicalId":33803,"journal":{"name":"IEEE Open Journal of the Communications Society","volume":"6 ","pages":"1170-1189"},"PeriodicalIF":6.3000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10870100","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10870100/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid increase in the number of cyber attacks and the emergence of various attack variations pose significant threats to the security of computer systems and networks. Various intrusion detection systems (IDS) are developed to defend computer systems and networks in response to these threats. One type of IDS, known as a host-based intrusion detection system (HIDS), focuses on securing a single host. Numerous HIDS have been proposed in the literature, incorporating various detection methods. This study develops multiple machine learning (ML) models and stacking ensemble based HIDS that can be used as detection methods in HIDS. Initially, n-grams, standard bag-of-words (BoW), binary BoW, probability BoW, and term frequency-inverse document frequency (TF-IDF) BoW methods are applied to the ADFA-LD and ADFA-WD datasets. Mutual information and k-means methods are used together for feature selection on the resulting BoW datasets. Individual models are created using either selected features or all features. Subsequently, the outputs of these individual models are used in extreme gradient boosting (XGBoost) and adaptive boosting (AdaBoost) models to develop stacking ensemble based models. The experimental results show that the best accuracy (ACC) among models using ADFA-LD based BoW datasets is achieved by the stacking ensemble based XGBoost model, which has an ACC of 0.9747. This XGBoost model utilizes the standard BoW dataset and selected features. Among models using ADFA-WD based BoW datasets, the stacking ensemble based XGBoost is also the most successful in terms of ACC, with an ACC of 0.9163, using the standard BoW dataset and all features.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
13.70
自引率
3.80%
发文量
94
审稿时长
10 weeks
期刊介绍: The IEEE Open Journal of the Communications Society (OJ-COMS) is an open access, all-electronic journal that publishes original high-quality manuscripts on advances in the state of the art of telecommunications systems and networks. The papers in IEEE OJ-COMS are included in Scopus. Submissions reporting new theoretical findings (including novel methods, concepts, and studies) and practical contributions (including experiments and development of prototypes) are welcome. Additionally, survey and tutorial articles are considered. The IEEE OJCOMS received its debut impact factor of 7.9 according to the Journal Citation Reports (JCR) 2023. The IEEE Open Journal of the Communications Society covers science, technology, applications and standards for information organization, collection and transfer using electronic, optical and wireless channels and networks. Some specific areas covered include: Systems and network architecture, control and management Protocols, software, and middleware Quality of service, reliability, and security Modulation, detection, coding, and signaling Switching and routing Mobile and portable communications Terminals and other end-user devices Networks for content distribution and distributed computing Communications-based distributed resources control.
期刊最新文献
Efficient Symbol Detection for Holographic MIMO Communications With Unitary Approximate Message Passing Variable-Rate Incremental-Redundancy HARQ for Finite Blocklengths The Role of Digital Twin in 6G-Based URLLCs: Current Contributions, Research Challenges, and Next Directions Trustworthy Reputation for Federated Learning in O-RAN Using Blockchain and Smart Contracts Efficient Spatial Channel Estimation in Extremely Large Antenna Array Communication Systems: A Subspace Approximated Matrix Completion Approach
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1