{"title":"Empirical Comparison of Cross-Validation and Test Data on Internet Traffic Classification Methods","authors":"Oluranti Jonathan, N. Omoregbe, S. Misra","doi":"10.1088/1742-6596/1299/1/012044","DOIUrl":null,"url":null,"abstract":"In this paper, we compare two validation methods that are used to estimate the performance of classification algorithms in a non-problem-specific knowledge scenario. One way to measure the performance of a classification algorithm is to determine its prediction error rate. However, this value cannot be calculated but estimated. In this work, we apply and compare two common methods used for estimation namely: test data and cross-validation. Precisely, we analyze and compare the statistical properties of the K-fold cross-validation and test data estimators of the prediction error rates of six classifiers namely; Naïve Bayes, KNN, Random Forest, SVM, J48, and OneR. From the study, the statistical property of repeated cross-validation tends to stabilize the prediction error estimation which in turn reduces the variance of the prediction error estimator when compared with test data. The NIMS dataset collected over a network was employed in the experimental study.","PeriodicalId":16821,"journal":{"name":"Journal of Physics: Conference Series","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Physics: Conference Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1742-6596/1299/1/012044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper, we compare two validation methods that are used to estimate the performance of classification algorithms in a non-problem-specific knowledge scenario. One way to measure the performance of a classification algorithm is to determine its prediction error rate. However, this value cannot be calculated but estimated. In this work, we apply and compare two common methods used for estimation namely: test data and cross-validation. Precisely, we analyze and compare the statistical properties of the K-fold cross-validation and test data estimators of the prediction error rates of six classifiers namely; Naïve Bayes, KNN, Random Forest, SVM, J48, and OneR. From the study, the statistical property of repeated cross-validation tends to stabilize the prediction error estimation which in turn reduces the variance of the prediction error estimator when compared with test data. The NIMS dataset collected over a network was employed in the experimental study.