Feature Importance and Predictive Modeling for Multi-source Healthcare Data with Missing Values

Karthik Srinivasan, Faiz Currim, S. Ram, Casey Lindberg, Esther Sternberg, Perry Skeath, B. Najafi, J. Razjouyan, Hyo-Ki Lee, Colin Foe-Parker, Nicole Goebel, Reuben Herzl, M. Mehl, Brian Gilligan, J. Heerwagen, Kevin Kampschroer, Kelli Canada
{"title":"Feature Importance and Predictive Modeling for Multi-source Healthcare Data with Missing Values","authors":"Karthik Srinivasan, Faiz Currim, S. Ram, Casey Lindberg, Esther Sternberg, Perry Skeath, B. Najafi, J. Razjouyan, Hyo-Ki Lee, Colin Foe-Parker, Nicole Goebel, Reuben Herzl, M. Mehl, Brian Gilligan, J. Heerwagen, Kevin Kampschroer, Kelli Canada","doi":"10.1145/2896338.2896347","DOIUrl":null,"url":null,"abstract":"With rapid development of sensor technologies and the internet of things, research in the area of connected health is increasing in importance and complexity with wide-reaching impacts for public health. As data sources such as mobile (wearable) sensors get cheaper, smaller, and smarter, important research questions can be answered by combining information from multiple data sources. However, integration of multiple heterogeneous data streams often results in a dataset with several empty cells or missing values. The challenge is to use such sparsely populated integrated datasets without compromising model performance. Naïve approaches for dataset modification such as discarding observations or ad-hoc replacement of missing values often lead to misleading results. In this paper, we discuss and evaluate current best-practices for modeling such data with missing values and then propose an ensemble-learning based sparse-data modeling framework. We develop a predictive model using this framework and compare it with existing models using a study in a healthcare setting. Instead of generating a single score on variable/feature importance, our framework enables the user to understand the importance of a variable based on the existing data values and their localized impact on the outcome.","PeriodicalId":146447,"journal":{"name":"Proceedings of the 6th International Conference on Digital Health Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Digital Health Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2896338.2896347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

With rapid development of sensor technologies and the internet of things, research in the area of connected health is increasing in importance and complexity with wide-reaching impacts for public health. As data sources such as mobile (wearable) sensors get cheaper, smaller, and smarter, important research questions can be answered by combining information from multiple data sources. However, integration of multiple heterogeneous data streams often results in a dataset with several empty cells or missing values. The challenge is to use such sparsely populated integrated datasets without compromising model performance. Naïve approaches for dataset modification such as discarding observations or ad-hoc replacement of missing values often lead to misleading results. In this paper, we discuss and evaluate current best-practices for modeling such data with missing values and then propose an ensemble-learning based sparse-data modeling framework. We develop a predictive model using this framework and compare it with existing models using a study in a healthcare setting. Instead of generating a single score on variable/feature importance, our framework enables the user to understand the importance of a variable based on the existing data values and their localized impact on the outcome.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
缺失值的多源医疗保健数据的特征重要性和预测建模
随着传感器技术和物联网的快速发展,互联健康领域的研究日益重要和复杂,对公共卫生产生了广泛的影响。随着移动(可穿戴)传感器等数据源变得更便宜、更小、更智能,重要的研究问题可以通过组合来自多个数据源的信息来回答。然而,多个异构数据流的集成通常会导致数据集有几个空单元格或缺失值。挑战在于在不影响模型性能的情况下使用这种稀疏分布的集成数据集。Naïve数据集修改的方法,如丢弃观测值或临时替换缺失值,通常会导致误导性的结果。在本文中,我们讨论和评估了目前对缺失值数据建模的最佳实践,然后提出了一个基于集成学习的稀疏数据建模框架。我们使用该框架开发了一个预测模型,并将其与医疗保健环境中的现有模型进行比较。我们的框架不是生成变量/特征重要性的单一分数,而是使用户能够基于现有数据值及其对结果的局部影响来理解变量的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Ubiquitous Bugs and Drugs Education for Children Through Mobile Games On Infectious Intestinal Disease Surveillance using Social Media Content Extracting Signals from Social Media for Chronic Disease Surveillance Emotional Virtual Agent to Improve Ageing in Place with Technology VAC Medi+board: Analysing Vaccine Rumours in News and Social Media
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1