Optimized hybrid imbalanced data sampling for decision tree training

Weronika Węgier, Michał Koziarski, Michal Wozniak
{"title":"Optimized hybrid imbalanced data sampling for decision tree training","authors":"Weronika Węgier, Michał Koziarski, Michal Wozniak","doi":"10.1145/3583133.3590702","DOIUrl":null,"url":null,"abstract":"For many real-world decision-making tasks, a key feature is decision explainability. Hence, the so-called glass-box models offer full explainability and are still prevalent. An important area of application is the classification of imbalanced data. We require that the proposed classifiers not make errors on the minority class while minimizing errors on the majority class. This paper proposes a method for preprocessing imbalanced data by generating minority class objects. We use a multi-criteria optimization method (NSGA-II) to avoid optimizing a single aggregate criterion. The method returns a group of non-dominated solutions from which the end user can choose the best solution from his point of view. The automatic solution selection from a Pareto front is also proposed for comparison purposes. The proposed method returns good-quality classifiers, often surpassing the quality of baseline single-objective methods, and is additionally characterized by full interpretability.","PeriodicalId":422029,"journal":{"name":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Companion Conference on Genetic and Evolutionary Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583133.3590702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

For many real-world decision-making tasks, a key feature is decision explainability. Hence, the so-called glass-box models offer full explainability and are still prevalent. An important area of application is the classification of imbalanced data. We require that the proposed classifiers not make errors on the minority class while minimizing errors on the majority class. This paper proposes a method for preprocessing imbalanced data by generating minority class objects. We use a multi-criteria optimization method (NSGA-II) to avoid optimizing a single aggregate criterion. The method returns a group of non-dominated solutions from which the end user can choose the best solution from his point of view. The automatic solution selection from a Pareto front is also proposed for comparison purposes. The proposed method returns good-quality classifiers, often surpassing the quality of baseline single-objective methods, and is additionally characterized by full interpretability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
决策树训练的优化混合不平衡数据采样
对于许多现实世界的决策任务,一个关键特征是决策的可解释性。因此,所谓的玻璃盒模型提供了充分的可解释性,并且仍然很流行。一个重要的应用领域是不平衡数据的分类。我们要求所提出的分类器不会在少数类上犯错误,同时最小化多数类上的错误。提出了一种通过生成少数类对象对不平衡数据进行预处理的方法。我们使用多准则优化方法(NSGA-II)来避免对单个聚合准则进行优化。该方法返回一组非主导解决方案,最终用户可以从中选择他认为的最佳解决方案。为了便于比较,还提出了从帕累托前沿自动选择解的方法。所提出的方法返回高质量的分类器,通常超过基线单目标方法的质量,并且具有完全可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Graph Q-learning Assisted Ant Colony Optimization for Vehicle Routing Problems with Time Windows Iterative Structure-Based Genetic Programming for Neural Architecture Search Bayesian Optimization For Choice Data Exploring Adaptive Components of SOMA Evaluation of the impact of various modifications to CMA-ES that facilitate its theoretical analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1