Cost‐sensitive classification with time constraint on incomplete data

IF 2.1 4区 数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Statistical Analysis and Data Mining Pub Date : 2024-06-25 DOI:10.1002/sam.11702
Yong‐Shiuan Lee, Chia‐Chi Wu
{"title":"Cost‐sensitive classification with time constraint on incomplete data","authors":"Yong‐Shiuan Lee, Chia‐Chi Wu","doi":"10.1002/sam.11702","DOIUrl":null,"url":null,"abstract":"Missing values are common, but dealing with them by inappropriate method may lead to large classification errors. Empirical evidences show that the tree‐based classification algorithms such as classification and regression tree (CART) can benefit from imputation, especially multiple imputation. Nevertheless, less attention has been paid to incorporating multiple imputation into cost‐sensitive decision tree induction. This study focuses on the treatment of missing data based on a time‐constrained minimal‐cost tree algorithm. We introduce various approaches to handle incomplete data into the algorithm including complete‐case analysis, missing‐value branch, single imputation, feature acquisition, and multiple imputation. A simulation study under different scenarios examines the predictive performances of the proposed strategies. The simulation results show that the combination of the algorithm with multiple imputation can assure classification accuracy under the budget. A real medical data example provides insights into the problem of missing values in cost‐sensitive learning and the advantages of the proposed methods.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"110 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11702","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Missing values are common, but dealing with them by inappropriate method may lead to large classification errors. Empirical evidences show that the tree‐based classification algorithms such as classification and regression tree (CART) can benefit from imputation, especially multiple imputation. Nevertheless, less attention has been paid to incorporating multiple imputation into cost‐sensitive decision tree induction. This study focuses on the treatment of missing data based on a time‐constrained minimal‐cost tree algorithm. We introduce various approaches to handle incomplete data into the algorithm including complete‐case analysis, missing‐value branch, single imputation, feature acquisition, and multiple imputation. A simulation study under different scenarios examines the predictive performances of the proposed strategies. The simulation results show that the combination of the algorithm with multiple imputation can assure classification accuracy under the budget. A real medical data example provides insights into the problem of missing values in cost‐sensitive learning and the advantages of the proposed methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在不完整数据的时间限制下进行对成本敏感的分类
缺失值很常见,但用不恰当的方法处理缺失值可能会导致很大的分类误差。经验证据表明,基于树的分类算法,如分类和回归树(CART),可以从估算中获益,尤其是多重估算。然而,将多重归因纳入成本敏感决策树归纳法的研究却较少受到关注。本研究的重点是基于时间受限的最小成本树算法处理缺失数据。我们在算法中引入了多种处理不完整数据的方法,包括完整案例分析、缺失值分支、单一归因、特征获取和多重归因。在不同场景下进行的模拟研究检验了所提策略的预测性能。仿真结果表明,算法与多重归因的结合可以在预算范围内确保分类准确性。通过一个真实的医疗数据实例,我们可以深入了解成本敏感学习中的缺失值问题以及所提方法的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistical Analysis and Data Mining
Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
3.20
自引率
7.70%
发文量
43
期刊介绍: Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.
期刊最新文献
Quantifying Epistemic Uncertainty in Binary Classification via Accuracy Gain A new logarithmic multiplicative distortion for correlation analysis Revisiting Winnow: A modified online feature selection algorithm for efficient binary classification A random forest approach for interval selection in functional regression Characterizing climate pathways using feature importance on echo state networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1