A New Information Measure Based on Example-Dependent Misclassification Costs and Its Application in Decision Tree Learning

Adv. Artif. Intell. Pub Date : 2009-01-01 DOI:10.1155/2009/134807

F. Wysotzki, Peter Geibel

{"title":"A New Information Measure Based on Example-Dependent Misclassification Costs and Its Application in Decision Tree Learning","authors":"F. Wysotzki, Peter Geibel","doi":"10.1155/2009/134807","DOIUrl":null,"url":null,"abstract":"This article describes how the costs of misclassification given with the individual training objects for classification learning can be used in the construction of decision trees for minimal cost instead of minimal error class decisions. This is demonstrated by defining modified, cost-dependent probabilities, a new, cost-dependent information measure, and using a cost-sensitive extension of the CAL5 algorithm for learning decision trees. The cost-dependent information measure ensures the selection of the (local) next best, that is, cost-minimizing, discriminating attribute in the sequential construction of the classification trees. This is shown to be a cost-dependent generalization of the classical information measure introduced by Shannon, which only depends on classical probabilities. It is therefore of general importance and extends classic information theory, knowledge processing, and cognitive science, since subjective evaluations of decision alternatives can be included in entropy and the transferred information. Decision trees can then be viewed as cost-minimizing decoders for class symbols emitted by a source and coded by feature vectors. Experiments with two artificial datasets and one application example show that this approach is more accurate than a method which uses class dependent costs given by experts a priori.","PeriodicalId":7253,"journal":{"name":"Adv. Artif. Intell.","volume":"9 3","pages":"134807:1-134807:13"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Artif. Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2009/134807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This article describes how the costs of misclassification given with the individual training objects for classification learning can be used in the construction of decision trees for minimal cost instead of minimal error class decisions. This is demonstrated by defining modified, cost-dependent probabilities, a new, cost-dependent information measure, and using a cost-sensitive extension of the CAL5 algorithm for learning decision trees. The cost-dependent information measure ensures the selection of the (local) next best, that is, cost-minimizing, discriminating attribute in the sequential construction of the classification trees. This is shown to be a cost-dependent generalization of the classical information measure introduced by Shannon, which only depends on classical probabilities. It is therefore of general importance and extends classic information theory, knowledge processing, and cognitive science, since subjective evaluations of decision alternatives can be included in entropy and the transferred information. Decision trees can then be viewed as cost-minimizing decoders for class symbols emitted by a source and coded by feature vectors. Experiments with two artificial datasets and one application example show that this approach is more accurate than a method which uses class dependent costs given by experts a priori.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于样例错误分类代价的信息度量及其在决策树学习中的应用

本文描述了如何将分类学习的单个训练对象的错误分类成本用于构建决策树，以实现最小的成本而不是最小的错误类决策。这可以通过定义修改的、成本相关的概率、一个新的、成本相关的信息度量，以及使用CAL5算法的成本敏感扩展来学习决策树来证明。代价相关的信息度量保证了在分类树的序贯构造中选择(局部)次优，即代价最小化的判别属性。这被证明是香农引入的经典信息测度的成本依赖泛化，它只依赖于经典概率。因此，它具有普遍的重要性，并扩展了经典的信息论、知识处理和认知科学，因为对决策方案的主观评价可以包含在熵和传递的信息中。然后，决策树可以被视为由源发出并由特征向量编码的类符号的成本最小化解码器。两个人工数据集的实验和一个应用实例表明，该方法比使用专家先验给出的类依赖代价的方法更准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Adv. Artif. Intell.

自引率

0.00%

发文量