An experimental investigation of the effect of discrete attributes on the precision of classification methods

Reza Entezari-Maleki, Seyyed Mehdi Iranmanesh, B. Minaei-Bidgoli
{"title":"An experimental investigation of the effect of discrete attributes on the precision of classification methods","authors":"Reza Entezari-Maleki, Seyyed Mehdi Iranmanesh, B. Minaei-Bidgoli","doi":"10.1109/ICICT.2009.5267189","DOIUrl":null,"url":null,"abstract":"In this paper, the precisions of the logistic regression, Naïve-Bayes and linear data classification methods, with regard to the Area Under Curve (AUC) metric have been compared. The effect of parameters including size of the dataset, kind of the independent attributes, number of the discrete attributes, and their values have been investigated. From the results, it can be concluded that in datasets consisting of both discrete and continuous attributes, the AUC of the three mentioned classifiers is the same. With increasing the number of the discrete attributes, the AUC of the logistic regression is increased and the precision related to this classifier become more than the other two classifiers. Also considering impact of the discrete attributes it can be seen that with increasing the number of values in discrete attributes the AUC related to the logistic regression classifier increases and linear regressions' AUC decreases, but the AUC of the Naïve-Bayes classifier remains constant. The results of this research can help data miners in selecting the more efficient classifiers based on the conditions of feature that exist in their datasets.","PeriodicalId":147005,"journal":{"name":"2009 International Conference on Information and Communication Technologies","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Information and Communication Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT.2009.5267189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

In this paper, the precisions of the logistic regression, Naïve-Bayes and linear data classification methods, with regard to the Area Under Curve (AUC) metric have been compared. The effect of parameters including size of the dataset, kind of the independent attributes, number of the discrete attributes, and their values have been investigated. From the results, it can be concluded that in datasets consisting of both discrete and continuous attributes, the AUC of the three mentioned classifiers is the same. With increasing the number of the discrete attributes, the AUC of the logistic regression is increased and the precision related to this classifier become more than the other two classifiers. Also considering impact of the discrete attributes it can be seen that with increasing the number of values in discrete attributes the AUC related to the logistic regression classifier increases and linear regressions' AUC decreases, but the AUC of the Naïve-Bayes classifier remains constant. The results of this research can help data miners in selecting the more efficient classifiers based on the conditions of feature that exist in their datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
离散属性对分类方法精度影响的实验研究
本文比较了logistic回归、Naïve-Bayes和线性数据分类方法对曲线下面积(AUC)度量的精度。研究了数据集的大小、独立属性的种类、离散属性的数量及其值等参数的影响。从结果可以看出,在离散属性和连续属性组成的数据集上,上述三种分类器的AUC是相同的。随着离散属性数量的增加,逻辑回归的AUC增加,与该分类器相关的精度也高于其他两种分类器。同时考虑离散属性的影响,可以看出随着离散属性中值的增加,逻辑回归分类器的AUC增加,线性回归的AUC减小,但Naïve-Bayes分类器的AUC保持不变。研究结果可以帮助数据挖掘者根据数据集中存在的特征条件选择更有效的分类器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Analytic network process applied to R&D project selection The impact of Performance Management System on the performance of the employees & the organization Assessment of node density in Cartesian Ad hoc Routing Protocols (CARP) Power optimization of relay channels A comparative analysis of DCCP variants (CCID2, CCID3), TCP and UDP for MPEG4 video applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1