Improving Network Security: An Intrusion Detection System (IDS) Dataset from Higher Learning Institutions, Mbeya University of Science and Technology (MUST), Tanzania

East African journal of information technology Pub Date : 2023-12-15 DOI:10.37284/eajit.6.1.1627

Daud M. Sindika, Mrindoko R. Nicholaus, Nabahani B. Hamadi

{"title":"Improving Network Security: An Intrusion Detection System (IDS) Dataset from Higher Learning Institutions, Mbeya University of Science and Technology (MUST), Tanzania","authors":"Daud M. Sindika, Mrindoko R. Nicholaus, Nabahani B. Hamadi","doi":"10.37284/eajit.6.1.1627","DOIUrl":null,"url":null,"abstract":"Nowadays, Internet-driven culture securing computer networks in Higher Learning Institutions (HLIs) has become a major responsibility. Intrusion Detection Systems (IDS) are crucial for protecting networks from unauthorized activity and cyber threats. This paper examines the process of improving network security by creating a comprehensive IDS dataset using real traffic from HLIs, highlighting the importance of accurate and representative data in improving the system's ability to identify and mitigate future cyber-attacks. The IDS model was created using a variety of machine learning (ML) techniques. Metrics like accuracy, precision, recall, and F1-score were used to assess the performance of each model. The dataset used for training and testing was real-world network traffic data obtained from the institution's computer network. The results showed that the developed IDS obtained exceptional accuracy rates, with Random Forest, Gradient Boosting, and XGBoost models all achieving an accuracy of around 93%. Precision and recall values were likewise quite high across all algorithms. Furthermore, the study discovered that data quality has a substantial impact on IDS performance. Proper data preparation, feature engineering, and noise removal were found to be helpful in improving model accuracy and reducing false positives. While the IDS models performed well throughout validation and testing, implementing such systems in a production setting necessitates careful thought. As a result, the essay also examined the procedures for testing and deploying the IDS models in a real-world scenario. It underlined the significance of ongoing monitoring and maintenance in order to keep the model effective in identifying intrusions. The research aids in the progress of network security in HLI. Educational institutions can better protect their precious assets and sensitive information from cyberattacks by understanding the impact of data quality on IDS performance and implementing effective deployment techniques","PeriodicalId":476140,"journal":{"name":"East African journal of information technology","volume":"263 26‐30","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"East African journal of information technology","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.37284/eajit.6.1.1627","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays, Internet-driven culture securing computer networks in Higher Learning Institutions (HLIs) has become a major responsibility. Intrusion Detection Systems (IDS) are crucial for protecting networks from unauthorized activity and cyber threats. This paper examines the process of improving network security by creating a comprehensive IDS dataset using real traffic from HLIs, highlighting the importance of accurate and representative data in improving the system's ability to identify and mitigate future cyber-attacks. The IDS model was created using a variety of machine learning (ML) techniques. Metrics like accuracy, precision, recall, and F1-score were used to assess the performance of each model. The dataset used for training and testing was real-world network traffic data obtained from the institution's computer network. The results showed that the developed IDS obtained exceptional accuracy rates, with Random Forest, Gradient Boosting, and XGBoost models all achieving an accuracy of around 93%. Precision and recall values were likewise quite high across all algorithms. Furthermore, the study discovered that data quality has a substantial impact on IDS performance. Proper data preparation, feature engineering, and noise removal were found to be helpful in improving model accuracy and reducing false positives. While the IDS models performed well throughout validation and testing, implementing such systems in a production setting necessitates careful thought. As a result, the essay also examined the procedures for testing and deploying the IDS models in a real-world scenario. It underlined the significance of ongoing monitoring and maintenance in order to keep the model effective in identifying intrusions. The research aids in the progress of network security in HLI. Educational institutions can better protect their precious assets and sensitive information from cyberattacks by understanding the impact of data quality on IDS performance and implementing effective deployment techniques

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

提高网络安全性：来自坦桑尼亚姆贝亚科技大学（MUST）高等教育机构的入侵检测系统（IDS）数据集

如今，在互联网文化的驱动下，确保高等院校（HLIs）计算机网络安全已成为一项重要责任。入侵检测系统（IDS）对于保护网络免受非法活动和网络威胁至关重要。本文通过使用来自高等教育机构的真实流量创建一个全面的 IDS 数据集，探讨了提高网络安全性的过程，强调了准确且具有代表性的数据对于提高系统识别和减轻未来网络攻击的能力的重要性。IDS 模型是利用各种机器学习 (ML) 技术创建的。准确率、精确度、召回率和 F1 分数等指标被用来评估每个模型的性能。用于训练和测试的数据集是从该机构计算机网络中获取的真实网络流量数据。结果显示，所开发的 IDS 准确率非常高，随机森林、梯度提升和 XGBoost 模型的准确率都达到了 93% 左右。所有算法的精确度和召回值同样相当高。此外，研究还发现，数据质量对 IDS 性能有重大影响。研究发现，适当的数据准备、特征工程和噪声去除有助于提高模型的准确性并减少误报。虽然 IDS 模型在整个验证和测试过程中表现良好，但在生产环境中实施此类系统仍需深思熟虑。因此，文章还研究了在现实世界中测试和部署 IDS 模型的程序。文章强调了持续监控和维护的重要性，以保持模型在识别入侵方面的有效性。这项研究有助于促进高职院校的网络安全。教育机构可以通过了解数据质量对 IDS 性能的影响和实施有效的部署技术，更好地保护其宝贵资产和敏感信息免受网络攻击。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

East African journal of information technology

自引率

0.00%

发文量