Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS)

智能学习系统与应用(英文) Pub Date : 2014-01-27 DOI:10.4236/JILSA.2014.61005

Md. Al Mehedi Hasan, M. Nasser, B. Pal, Shamim Ahmad

{"title":"Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS)","authors":"Md. Al Mehedi Hasan, M. Nasser, B. Pal, Shamim Ahmad","doi":"10.4236/JILSA.2014.61005","DOIUrl":null,"url":null,"abstract":"The success of \nany Intrusion Detection System (IDS) is a complicated problem due to its \nnonlinearity and the quantitative or qualitative network traffic data stream \nwith many features. To get rid of this problem, several types of intrusion \ndetection methods have been proposed and shown different levels of accuracy. \nThis is why the choice of the effective and robust method for IDS is very \nimportant topic in information security. In this work, we have built two models \nfor the classification purpose. One is based on Support Vector Machines (SVM) \nand the other is Random Forests (RF). Experimental results show that either \nclassifier is effective. SVM is slightly more accurate, but more expensive in \nterms of time. RF produces similar accuracy in a much faster manner if given \nmodeling parameters. These classifiers can contribute to an IDS system as one \nsource of analysis and increase its accuracy. In this paper, KDD’99 Dataset is used and find out which \none is the best intrusion \ndetector for this dataset. Statistical \nanalysis on KDD’99 dataset found important issues which highly affect the \nperformance of evaluated systems and results in a very poor evaluation of \nanomaly detection approaches. The most important deficiency in the KDD’99 dataset \nis the huge number of redundant records. To solve these \nissues, we have developed a new dataset, KDD99Train+ and KDD99Test+, which does \nnot include any redundant records in the train set as well as in the test set, \nso the classifiers will not be biased towards more frequent records. The \nnumbers of records in the train and test sets are now reasonable, which make it \naffordable to run the experiments on the complete set without the need to \nrandomly select a small portion. The findings of this paper will be very useful \nto use SVM and RF in a more \nmeaningful way in order to maximize the performance rate and minimize the false \nnegative rate.","PeriodicalId":69452,"journal":{"name":"智能学习系统与应用(英文)","volume":"6 1","pages":"45-52"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"134","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"智能学习系统与应用(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JILSA.2014.61005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 134

Abstract

The success of any Intrusion Detection System (IDS) is a complicated problem due to its nonlinearity and the quantitative or qualitative network traffic data stream with many features. To get rid of this problem, several types of intrusion detection methods have been proposed and shown different levels of accuracy. This is why the choice of the effective and robust method for IDS is very important topic in information security. In this work, we have built two models for the classification purpose. One is based on Support Vector Machines (SVM) and the other is Random Forests (RF). Experimental results show that either classifier is effective. SVM is slightly more accurate, but more expensive in terms of time. RF produces similar accuracy in a much faster manner if given modeling parameters. These classifiers can contribute to an IDS system as one source of analysis and increase its accuracy. In this paper, KDD’99 Dataset is used and find out which one is the best intrusion detector for this dataset. Statistical analysis on KDD’99 dataset found important issues which highly affect the performance of evaluated systems and results in a very poor evaluation of anomaly detection approaches. The most important deficiency in the KDD’99 dataset is the huge number of redundant records. To solve these issues, we have developed a new dataset, KDD99Train+ and KDD99Test+, which does not include any redundant records in the train set as well as in the test set, so the classifiers will not be biased towards more frequent records. The numbers of records in the train and test sets are now reasonable, which make it affordable to run the experiments on the complete set without the need to randomly select a small portion. The findings of this paper will be very useful to use SVM and RF in a more meaningful way in order to maximize the performance rate and minimize the false negative rate.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

入侵检测系统的支持向量机和随机森林建模

任何入侵检测系统都是一个复杂的问题，因为它的非线性和定量或定性的网络流量数据流具有许多特征。为了解决这一问题，人们提出了几种入侵检测方法，并显示出不同程度的准确性。这就是为什么选择有效的、鲁棒的入侵检测方法是信息安全中非常重要的课题。在这项工作中，我们建立了两个用于分类的模型。一种是基于支持向量机(SVM)，另一种是随机森林(RF)。实验结果表明，两种分类器都是有效的。SVM稍微准确一些，但是在时间上花费更多。如果给定建模参数，射频以更快的方式产生类似的精度。这些分类器可以作为IDS系统的一个分析来源，并提高其准确性。本文以KDD ' 99数据集为研究对象，找出最适合该数据集的入侵检测器。对KDD ' 99数据集的统计分析发现了严重影响被评估系统性能的重要问题，并导致异常检测方法的评估非常差。KDD ' 99数据集中最重要的缺陷是大量的冗余记录。为了解决这些问题，我们开发了一个新的数据集，KDD99Train+和KDD99Test+，它不包括训练集中和测试集中的任何冗余记录，因此分类器不会偏向于更频繁的记录。训练和测试集中的记录数量现在是合理的，这使得在完整的集合上运行实验变得负担得起，而不需要随机选择一小部分。本文的研究结果对于更有意义地使用支持向量机和射频来最大化性能和最小化假阴性率是非常有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

智能学习系统与应用(英文)

自引率

0.00%

发文量

135