Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Statistical Software Pub Date : 2023-01-01 DOI:10.18637/jss.v107.i09
Ranjit Lall, Thomas Robinson
{"title":"Efficient Multiple Imputation for Diverse Data in <i>Python</i> and <i>R</i>: <b>MIDASpy</b> and <b>rMIDAS</b>","authors":"Ranjit Lall, Thomas Robinson","doi":"10.18637/jss.v107.i09","DOIUrl":null,"url":null,"abstract":"This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18637/jss.v107.i09","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 2

Abstract

This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The packages implement a recently developed approach to multiple imputation known as MIDAS, which involves introducing additional missing values into the dataset, attempting to reconstruct these values with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing data. These steps are executed by a fast and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their particular application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a detailed guide to these functionalities and demonstrate their usage on a large real dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在Python和R: MIDASpy和rMIDAS中实现多种数据的高效多重输入
本文介绍了在Python (MIDASpy)和R (rMIDAS)中使用深度学习方法有效地输入缺失数据的软件包。这些软件包实现了最近开发的一种称为MIDAS的多重输入方法,该方法包括在数据集中引入额外的缺失值,尝试使用一种称为去噪自动编码器的无监督神经网络重建这些值,并使用生成的模型绘制原始缺失数据的输入。这些步骤是由一个快速和灵活的算法来执行的,它扩大了数据的数量和范围,可以用多次插值来分析。为了帮助用户优化其特定应用的算法,MIDASpy和rMIDAS提供了大量用户友好的工具来校准和验证插补模型。我们提供了这些功能的详细指南,并演示了它们在大型真实数据集上的使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Statistical Software
Journal of Statistical Software 工程技术-计算机:跨学科应用
CiteScore
10.70
自引率
1.70%
发文量
40
审稿时长
6-12 weeks
期刊介绍: The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.
期刊最新文献
spsurvey: Spatial Sampling Design and Analysis in R. Application of Equal Local Levels to Improve Q-Q Plot Testing Bands with R Package qqconf. Elastic Net Regularization Paths for All Generalized Linear Models. Broken Stick Model for Irregular Longitudinal Data jumpdiff: A Python Library for Statistical Inference of Jump-Diffusion Processes in Observational or Experimental Data Sets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1