大规模电子健康记录数据库的广义混合建模:什么是健康的血清钾?

Cristian-Sorin Bologa, V. Pankratz, M. Unruh, M. Roumelioti, V. Shah, S. K. Shaffi, Soraya Arzhan, John Cook, C. Argyropoulos
{"title":"大规模电子健康记录数据库的广义混合建模:什么是健康的血清钾?","authors":"Cristian-Sorin Bologa, V. Pankratz, M. Unruh, M. Roumelioti, V. Shah, S. K. Shaffi, Soraya Arzhan, John Cook, C. Argyropoulos","doi":"10.21203/rs.3.rs-245946/v1","DOIUrl":null,"url":null,"abstract":"Converting electronic health record (EHR) entries to useful clinical inferences requires one to address computational challenges due to the large number of repeated observations in individual patients. Unfortunately, the libraries of major statistical environments which implement Generalized Linear Mixed Models for such analyses have been shown to scale poorly in big datasets. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve hundreds of thousands or millions of dimensions (one for each patient). The Laplace Approximation (LA) plays a major role in the development of the theory of GLMMs and it can approximate integrals in high dimensions with acceptable accuracy. We thus examined the scalability of Laplace based calculations for GLMMs. To do so we coded GLMMs in the R package TMB. TMB numerically optimizes complex likelihood expressions in a parallelizable manner by combining the LA with algorithmic differentiation (AD). We report on the feasibility of this approach to support clinical inferences in the HyperKalemia Benchmark Problem (HKBP). In the HKBP we associate potassium levels and their trajectories over time with survival in all patients in the Cerner Health Facts EHR database. Analyzing the HKBP requires the evaluation of an integral in over 10 million dimensions. The scale of this problem puts far beyond the reach of methodologies currently available. The major clinical inferences in this problem is the establishment of a population response curve that relates the potassium level with mortality, and an estimate of the variability of individual risk in the population. Based on our experience on the HKBP we conclude that the combination of the LA and AD offers a computationally efficient approach for the analysis of big repeated measures data with GLMMs.","PeriodicalId":409996,"journal":{"name":"arXiv: Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Generalized Mixed Modeling in Massive Electronic Health Record Databases: What is a Healthy Serum Potassium?\",\"authors\":\"Cristian-Sorin Bologa, V. Pankratz, M. Unruh, M. Roumelioti, V. Shah, S. K. Shaffi, Soraya Arzhan, John Cook, C. Argyropoulos\",\"doi\":\"10.21203/rs.3.rs-245946/v1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Converting electronic health record (EHR) entries to useful clinical inferences requires one to address computational challenges due to the large number of repeated observations in individual patients. Unfortunately, the libraries of major statistical environments which implement Generalized Linear Mixed Models for such analyses have been shown to scale poorly in big datasets. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve hundreds of thousands or millions of dimensions (one for each patient). The Laplace Approximation (LA) plays a major role in the development of the theory of GLMMs and it can approximate integrals in high dimensions with acceptable accuracy. We thus examined the scalability of Laplace based calculations for GLMMs. To do so we coded GLMMs in the R package TMB. TMB numerically optimizes complex likelihood expressions in a parallelizable manner by combining the LA with algorithmic differentiation (AD). We report on the feasibility of this approach to support clinical inferences in the HyperKalemia Benchmark Problem (HKBP). In the HKBP we associate potassium levels and their trajectories over time with survival in all patients in the Cerner Health Facts EHR database. Analyzing the HKBP requires the evaluation of an integral in over 10 million dimensions. The scale of this problem puts far beyond the reach of methodologies currently available. The major clinical inferences in this problem is the establishment of a population response curve that relates the potassium level with mortality, and an estimate of the variability of individual risk in the population. Based on our experience on the HKBP we conclude that the combination of the LA and AD offers a computationally efficient approach for the analysis of big repeated measures data with GLMMs.\",\"PeriodicalId\":409996,\"journal\":{\"name\":\"arXiv: Applications\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv: Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21203/rs.3.rs-245946/v1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-245946/v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

将电子健康记录(EHR)条目转换为有用的临床推断需要解决由于个体患者中大量重复观察而导致的计算挑战。不幸的是,用于此类分析的实现广义线性混合模型的主要统计环境库在大数据集中的可扩展性很差。主要的计算瓶颈涉及多变量积分的数值计算,即使是最简单的EHR分析也可能涉及数十万或数百万个维度(每个患者一个)。拉普拉斯近似(LA)在glmm理论的发展中起着重要的作用,它可以以可接受的精度逼近高维积分。因此,我们检查了基于拉普拉斯计算的glmm的可扩展性。为此,我们在R包TMB中编码了glmm。TMB通过将LA与算法微分(AD)相结合,以并行化的方式对复杂似然表达式进行数值优化。我们报告了这种方法的可行性,以支持高钾血症基准问题(HKBP)的临床推论。在HKBP中,我们将Cerner Health Facts EHR数据库中所有患者的钾水平及其随时间的轨迹与生存率联系起来。分析HKBP需要计算超过1000万个维度的积分。这个问题的规模远远超出了现有方法的范围。这个问题的主要临床推论是建立了一个人群反应曲线,将钾水平与死亡率联系起来,并估计了人群中个体风险的可变性。根据我们在HKBP上的经验,我们得出结论,LA和AD的结合为使用glmm分析大重复测量数据提供了一种计算效率高的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generalized Mixed Modeling in Massive Electronic Health Record Databases: What is a Healthy Serum Potassium?
Converting electronic health record (EHR) entries to useful clinical inferences requires one to address computational challenges due to the large number of repeated observations in individual patients. Unfortunately, the libraries of major statistical environments which implement Generalized Linear Mixed Models for such analyses have been shown to scale poorly in big datasets. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve hundreds of thousands or millions of dimensions (one for each patient). The Laplace Approximation (LA) plays a major role in the development of the theory of GLMMs and it can approximate integrals in high dimensions with acceptable accuracy. We thus examined the scalability of Laplace based calculations for GLMMs. To do so we coded GLMMs in the R package TMB. TMB numerically optimizes complex likelihood expressions in a parallelizable manner by combining the LA with algorithmic differentiation (AD). We report on the feasibility of this approach to support clinical inferences in the HyperKalemia Benchmark Problem (HKBP). In the HKBP we associate potassium levels and their trajectories over time with survival in all patients in the Cerner Health Facts EHR database. Analyzing the HKBP requires the evaluation of an integral in over 10 million dimensions. The scale of this problem puts far beyond the reach of methodologies currently available. The major clinical inferences in this problem is the establishment of a population response curve that relates the potassium level with mortality, and an estimate of the variability of individual risk in the population. Based on our experience on the HKBP we conclude that the combination of the LA and AD offers a computationally efficient approach for the analysis of big repeated measures data with GLMMs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Weekly Bayesian Modelling Strategy to Predict Deaths by COVID-19: a Model and Case Study for the State of Santa Catarina, Brazil Selecting the Most Effective Nudge: Evidence from a Large-Scale Experiment on Immunization Revealing the Transmission Dynamics of COVID-19: A Bayesian Framework for Rt Estimation Improving living biomass C-stock loss estimates by combining optical satellite, airborne laser scanning, and NFI data Bayesian classification for dating archaeological sites via projectile points.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1