Still More Shades of Null: A Benchmark for Responsible Missing Value Imputation

Falaah Arif Khan, Denys Herasymuk, Nazar Protsiv, Julia Stoyanovich
{"title":"Still More Shades of Null: A Benchmark for Responsible Missing Value Imputation","authors":"Falaah Arif Khan, Denys Herasymuk, Nazar Protsiv, Julia Stoyanovich","doi":"arxiv-2409.07510","DOIUrl":null,"url":null,"abstract":"We present Shades-of-NULL, a benchmark for responsible missing value\nimputation. Our benchmark includes state-of-the-art imputation techniques, and\nembeds them into the machine learning development lifecycle. We model realistic\nmissingness scenarios that go beyond Rubin's classic Missing Completely at\nRandom (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR), to\ninclude multi-mechanism missingness (when different missingness patterns\nco-exist in the data) and missingness shift (when the missingness mechanism\nchanges between training and test). Another key novelty of our work is that we\nevaluate imputers holistically, based on the predictive performance, fairness\nand stability of the models that are trained and tested on the data they\nproduce. We use Shades-of-NULL to conduct a large-scale empirical study involving\n20,952 experimental pipelines, and find that, while there is no single\nbest-performing imputation approach for all missingness types, interesting\nperformance patterns do emerge when comparing imputer performance in simpler\nvs. more complex missingness scenarios. Further, while predictive performance,\nfairness and stability can be seen as orthogonal, we identify trade-offs among\nthem that arise due to the combination of missingness scenario, the choice of\nan imputer, and the architecture of the model trained on the data\npost-imputation. We make Shades-of-NULL publicly available, and hope to enable\nresearchers to comprehensively and rigorously evaluate new missing value\nimputation methods on a wide range of evaluation metrics, in plausible and\nsocially meaningful missingness scenarios.","PeriodicalId":501112,"journal":{"name":"arXiv - CS - Computers and Society","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computers and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present Shades-of-NULL, a benchmark for responsible missing value imputation. Our benchmark includes state-of-the-art imputation techniques, and embeds them into the machine learning development lifecycle. We model realistic missingness scenarios that go beyond Rubin's classic Missing Completely at Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR), to include multi-mechanism missingness (when different missingness patterns co-exist in the data) and missingness shift (when the missingness mechanism changes between training and test). Another key novelty of our work is that we evaluate imputers holistically, based on the predictive performance, fairness and stability of the models that are trained and tested on the data they produce. We use Shades-of-NULL to conduct a large-scale empirical study involving 20,952 experimental pipelines, and find that, while there is no single best-performing imputation approach for all missingness types, interesting performance patterns do emerge when comparing imputer performance in simpler vs. more complex missingness scenarios. Further, while predictive performance, fairness and stability can be seen as orthogonal, we identify trade-offs among them that arise due to the combination of missingness scenario, the choice of an imputer, and the architecture of the model trained on the data post-imputation. We make Shades-of-NULL publicly available, and hope to enable researchers to comprehensively and rigorously evaluate new missing value imputation methods on a wide range of evaluation metrics, in plausible and socially meaningful missingness scenarios.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
更多的 "空":负责任的缺失值估算基准
我们介绍了负责任的缺失值计算基准 Shades-of-NULL。我们的基准包括最先进的估算技术,并将其纳入机器学习开发生命周期。我们模拟了现实的缺失场景,这些场景不仅包括鲁宾经典的完全随机缺失(MCAR)、随机缺失(MAR)和非随机缺失(MNAR),还包括多机制缺失(当数据中存在不同的缺失模式时)和缺失转移(当缺失机制在训练和测试之间发生变化时)。我们工作的另一个关键新颖之处在于,我们根据在所产生的数据上训练和测试的模型的预测性能、公平性和稳定性,对误报者进行全面评估。我们使用 Shades-of-NULL 进行了大规模的实证研究,涉及 20,952 个实验管道,结果发现,虽然没有一种针对所有缺失类型的性能最佳的估算方法,但在比较估算器在简单和复杂缺失情况下的性能时,确实出现了有趣的性能模式。此外,虽然预测性能、公平性和稳定性可以看作是正交的,但我们发现它们之间的权衡是由缺失情景、计算器的选择以及数据输入后训练模型的结构等因素共同造成的。我们公开了 Shades-of-NULL,希望能让研究人员在可信且有社会意义的缺失情景下,根据广泛的评估指标对新的缺失值输入方法进行全面而严格的评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Law-based and standards-oriented approach for privacy impact assessment in medical devices: a topic for lawyers, engineers and healthcare practitioners in MedTech Inside Out or Not: Privacy Implications of Emotional Disclosure Idiosyncratic properties of Australian STV election counting Gender Representation and Bias in Indian Civil Service Mock Interviews Reporting Non-Consensual Intimate Media: An Audit Study of Deepfakes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1