{"title":"Empirical Performance of CART, C5.0 and Random Forest Classification Algorithms for Decision Trees","authors":"Bissilimou Racidatou Orounla, Akoeugnigan Idelphonse Sode, Kolawole Valère Salako, Romain Glèlè Kakaï","doi":"10.16929/ajas/2023.1399.274","DOIUrl":null,"url":null,"abstract":"This study compares the performance of <i>CART</i>, <i>C5.0</i> and Random Forest (<i>RF</i>) algorithms. 25 continuous predictors and 25 factors were simulated using a population size of 10,000. Based on this data, sample data were generated by varying the number of predictors, the proportion of categorical versus continuous predictors and the sample size. The performance of the tree algorithms increases with sample size and the number of variables, but for <i>RF</i>, it is highly greater than the one of <i>CART</i> and <i>C5.0</i>. Irrespective of the algorithms, the performance decreases when there are more categorical variables than continuous variables.","PeriodicalId":332314,"journal":{"name":"African Journal of Applied Statistics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"African Journal of Applied Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16929/ajas/2023.1399.274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study compares the performance of CART, C5.0 and Random Forest (RF) algorithms. 25 continuous predictors and 25 factors were simulated using a population size of 10,000. Based on this data, sample data were generated by varying the number of predictors, the proportion of categorical versus continuous predictors and the sample size. The performance of the tree algorithms increases with sample size and the number of variables, but for RF, it is highly greater than the one of CART and C5.0. Irrespective of the algorithms, the performance decreases when there are more categorical variables than continuous variables.