Reza Entezari-Maleki, Seyyed Mehdi Iranmanesh, B. Minaei-Bidgoli
{"title":"An experimental investigation of the effect of discrete attributes on the precision of classification methods","authors":"Reza Entezari-Maleki, Seyyed Mehdi Iranmanesh, B. Minaei-Bidgoli","doi":"10.1109/ICICT.2009.5267189","DOIUrl":null,"url":null,"abstract":"In this paper, the precisions of the logistic regression, Naïve-Bayes and linear data classification methods, with regard to the Area Under Curve (AUC) metric have been compared. The effect of parameters including size of the dataset, kind of the independent attributes, number of the discrete attributes, and their values have been investigated. From the results, it can be concluded that in datasets consisting of both discrete and continuous attributes, the AUC of the three mentioned classifiers is the same. With increasing the number of the discrete attributes, the AUC of the logistic regression is increased and the precision related to this classifier become more than the other two classifiers. Also considering impact of the discrete attributes it can be seen that with increasing the number of values in discrete attributes the AUC related to the logistic regression classifier increases and linear regressions' AUC decreases, but the AUC of the Naïve-Bayes classifier remains constant. The results of this research can help data miners in selecting the more efficient classifiers based on the conditions of feature that exist in their datasets.","PeriodicalId":147005,"journal":{"name":"2009 International Conference on Information and Communication Technologies","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Information and Communication Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT.2009.5267189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
In this paper, the precisions of the logistic regression, Naïve-Bayes and linear data classification methods, with regard to the Area Under Curve (AUC) metric have been compared. The effect of parameters including size of the dataset, kind of the independent attributes, number of the discrete attributes, and their values have been investigated. From the results, it can be concluded that in datasets consisting of both discrete and continuous attributes, the AUC of the three mentioned classifiers is the same. With increasing the number of the discrete attributes, the AUC of the logistic regression is increased and the precision related to this classifier become more than the other two classifiers. Also considering impact of the discrete attributes it can be seen that with increasing the number of values in discrete attributes the AUC related to the logistic regression classifier increases and linear regressions' AUC decreases, but the AUC of the Naïve-Bayes classifier remains constant. The results of this research can help data miners in selecting the more efficient classifiers based on the conditions of feature that exist in their datasets.