A New Information Measure Based on Example-Dependent Misclassification Costs and Its Application in Decision Tree Learning

F. Wysotzki, Peter Geibel
{"title":"A New Information Measure Based on Example-Dependent Misclassification Costs and Its Application in Decision Tree Learning","authors":"F. Wysotzki, Peter Geibel","doi":"10.1155/2009/134807","DOIUrl":null,"url":null,"abstract":"This article describes how the costs of misclassification given with the individual training objects for classification learning can be used in the construction of decision trees for minimal cost instead of minimal error class decisions. This is demonstrated by defining modified, cost-dependent probabilities, a new, cost-dependent information measure, and using a cost-sensitive extension of the CAL5 algorithm for learning decision trees. The cost-dependent information measure ensures the selection of the (local) next best, that is, cost-minimizing, discriminating attribute in the sequential construction of the classification trees. This is shown to be a cost-dependent generalization of the classical information measure introduced by Shannon, which only depends on classical probabilities. It is therefore of general importance and extends classic information theory, knowledge processing, and cognitive science, since subjective evaluations of decision alternatives can be included in entropy and the transferred information. Decision trees can then be viewed as cost-minimizing decoders for class symbols emitted by a source and coded by feature vectors. Experiments with two artificial datasets and one application example show that this approach is more accurate than a method which uses class dependent costs given by experts a priori.","PeriodicalId":7253,"journal":{"name":"Adv. Artif. Intell.","volume":"9 3","pages":"134807:1-134807:13"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2009/134807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This article describes how the costs of misclassification given with the individual training objects for classification learning can be used in the construction of decision trees for minimal cost instead of minimal error class decisions. This is demonstrated by defining modified, cost-dependent probabilities, a new, cost-dependent information measure, and using a cost-sensitive extension of the CAL5 algorithm for learning decision trees. The cost-dependent information measure ensures the selection of the (local) next best, that is, cost-minimizing, discriminating attribute in the sequential construction of the classification trees. This is shown to be a cost-dependent generalization of the classical information measure introduced by Shannon, which only depends on classical probabilities. It is therefore of general importance and extends classic information theory, knowledge processing, and cognitive science, since subjective evaluations of decision alternatives can be included in entropy and the transferred information. Decision trees can then be viewed as cost-minimizing decoders for class symbols emitted by a source and coded by feature vectors. Experiments with two artificial datasets and one application example show that this approach is more accurate than a method which uses class dependent costs given by experts a priori.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于样例错误分类代价的信息度量及其在决策树学习中的应用
本文描述了如何将分类学习的单个训练对象的错误分类成本用于构建决策树,以实现最小的成本而不是最小的错误类决策。这可以通过定义修改的、成本相关的概率、一个新的、成本相关的信息度量,以及使用CAL5算法的成本敏感扩展来学习决策树来证明。代价相关的信息度量保证了在分类树的序贯构造中选择(局部)次优,即代价最小化的判别属性。这被证明是香农引入的经典信息测度的成本依赖泛化,它只依赖于经典概率。因此,它具有普遍的重要性,并扩展了经典的信息论、知识处理和认知科学,因为对决策方案的主观评价可以包含在熵和传递的信息中。然后,决策树可以被视为由源发出并由特征向量编码的类符号的成本最小化解码器。两个人工数据集的实验和一个应用实例表明,该方法比使用专家先验给出的类依赖代价的方法更准确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
iWordNet: A New Approach to Cognitive Science and Artificial Intelligence Natural Language Processing and Fuzzy Tools for Business Processes in a Geolocation Context Method for Solving LASSO Problem Based on Multidimensional Weight Selection and Configuration of Sorption Isotherm Models in Soils Using Artificial Bees Guided by the Particle Swarm Weighted Constraint Satisfaction for Smart Home Automation and Optimization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1