A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA.

IF 1.4 4区 数学 Q2 STATISTICS & PROBABILITY Annals of Applied Statistics Pub Date : 2024-12-01 Epub Date: 2024-10-31 DOI:10.1214/24-AOAS1938
Jill Hasler, Yanyuan Ma, Yizheng Wei, Ravi Parikh, Jinbo Chen
{"title":"A SEMIPARAMETRIC METHOD FOR RISK PREDICTION USING INTEGRATED ELECTRONIC HEALTH RECORD DATA.","authors":"Jill Hasler, Yanyuan Ma, Yizheng Wei, Ravi Parikh, Jinbo Chen","doi":"10.1214/24-AOAS1938","DOIUrl":null,"url":null,"abstract":"<p><p>When using electronic health records (EHRs) for clinical and translational research, additional data is often available from external sources to enrich the information extracted from EHRs. For example, academic biobanks have more granular data available, and patient reported data is often collected through small-scale surveys. It is common that the external data is available only for a small subset of patients who have EHR information. We propose efficient and robust methods for building and evaluating models for predicting the risk of binary outcomes using such integrated EHR data. Our method is built upon an idea derived from the two-phase design literature that modeling the availability of a patient's external data as a function of an EHR-based preliminary predictive score leads to effective utilization of the EHR data. Through both theoretical and simulation studies, we show that our method has high efficiency for estimating log-odds ratio parameters, the area under the ROC curve, as well as other measures for quantifying predictive accuracy. We apply our method to develop a model for predicting the short-term mortality risk of oncology patients, where the data was extracted from the University of Pennsylvania hospital system EHR and combined with survey-based patient reported outcome data.</p>","PeriodicalId":50772,"journal":{"name":"Annals of Applied Statistics","volume":"18 4","pages":"3318-3337"},"PeriodicalIF":1.4000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934126/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Applied Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/24-AOAS1938","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/31 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

When using electronic health records (EHRs) for clinical and translational research, additional data is often available from external sources to enrich the information extracted from EHRs. For example, academic biobanks have more granular data available, and patient reported data is often collected through small-scale surveys. It is common that the external data is available only for a small subset of patients who have EHR information. We propose efficient and robust methods for building and evaluating models for predicting the risk of binary outcomes using such integrated EHR data. Our method is built upon an idea derived from the two-phase design literature that modeling the availability of a patient's external data as a function of an EHR-based preliminary predictive score leads to effective utilization of the EHR data. Through both theoretical and simulation studies, we show that our method has high efficiency for estimating log-odds ratio parameters, the area under the ROC curve, as well as other measures for quantifying predictive accuracy. We apply our method to develop a model for predicting the short-term mortality risk of oncology patients, where the data was extracted from the University of Pennsylvania hospital system EHR and combined with survey-based patient reported outcome data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
综合电子病历数据风险预测的半参数方法。
在使用电子健康记录(EHRs)进行临床和转译研究时,通常可以从外部来源获得额外的数据,以丰富从EHRs中提取的信息。例如,学术生物银行拥有更细粒度的数据,而患者报告的数据通常是通过小规模调查收集的。通常,只有一小部分拥有电子病历信息的患者可以获得外部数据。我们提出了有效和稳健的方法来建立和评估模型,预测二元结果的风险,使用这种集成的电子病历数据。我们的方法建立在两阶段设计文献的思想之上,即将患者外部数据的可用性建模为基于EHR的初步预测评分的函数,从而有效地利用EHR数据。通过理论和仿真研究,我们证明了我们的方法在估计对数-优势比参数、ROC曲线下面积以及其他量化预测精度的措施方面具有很高的效率。我们运用我们的方法开发了一个预测肿瘤患者短期死亡风险的模型,该模型的数据提取自宾夕法尼亚大学医院系统的电子病历,并结合基于调查的患者报告的结果数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Annals of Applied Statistics
Annals of Applied Statistics 社会科学-统计学与概率论
CiteScore
3.10
自引率
5.60%
发文量
131
审稿时长
6-12 weeks
期刊介绍: Statistical research spans an enormous range from direct subject-matter collaborations to pure mathematical theory. The Annals of Applied Statistics, the newest journal from the IMS, is aimed at papers in the applied half of this range. Published quarterly in both print and electronic form, our goal is to provide a timely and unified forum for all areas of applied statistics.
期刊最新文献
MULTI-OBJECT DATA INTEGRATION IN THE STUDY OF PRIMARY PROGRESSIVE APHASIA. JOINT MODELING FOR LEARNING DECISION-MAKING DYNAMICS IN BEHAVIORAL EXPERIMENTS. TREE-REGULARIZED BAYESIAN LATENT CLASS ANALYSIS FOR IMPROVING WEAKLY SEPARATED DIETARY PATTERN SUBTYPING IN SMALL-SIZED SUBPOPULATIONS. SUPERVISED LEARNING OF OUTCOME-RELEVANT ITEMS FROM A QUESTIONNAIRE VIA MIXED INTEGER OPTIMIZATION. PERSONALIZED RISK PREDICTION FOR CANCER SURVIVORS: A GENERALIZED BAYESIAN SEMI-PARAMETRIC MODEL OF RECURRENT EVENTS WITH COMPETING OUTCOMES.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1