An open-source framework for end-to-end analysis of electronic health record data

IF 58.7 1区 医学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Nature Medicine Pub Date : 2024-09-12 DOI:10.1038/s41591-024-03214-0
Lukas Heumos, Philipp Ehmele, Tim Treis, Julius Upmeier zu Belzen, Eljas Roellin, Lilly May, Altana Namsaraeva, Nastassya Horlava, Vladimir A. Shitov, Xinyue Zhang, Luke Zappia, Rainer Knoll, Niklas J. Lang, Leon Hetzel, Isaac Virshup, Lisa Sikkema, Fabiola Curion, Roland Eils, Herbert B. Schiller, Anne Hilgendorff, Fabian J. Theis
{"title":"An open-source framework for end-to-end analysis of electronic health record data","authors":"Lukas Heumos, Philipp Ehmele, Tim Treis, Julius Upmeier zu Belzen, Eljas Roellin, Lilly May, Altana Namsaraeva, Nastassya Horlava, Vladimir A. Shitov, Xinyue Zhang, Luke Zappia, Rainer Knoll, Niklas J. Lang, Leon Hetzel, Isaac Virshup, Lisa Sikkema, Fabiola Curion, Roland Eils, Herbert B. Schiller, Anne Hilgendorff, Fabian J. Theis","doi":"10.1038/s41591-024-03214-0","DOIUrl":null,"url":null,"abstract":"With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community. Incorporating a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations and to longitudinal analyses, an open-source software is proposed to standardize current electronic health record data processing and analysis pipelines.","PeriodicalId":19037,"journal":{"name":"Nature Medicine","volume":"30 11","pages":"3369-3380"},"PeriodicalIF":58.7000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41591-024-03214-0.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Medicine","FirstCategoryId":"3","ListUrlMain":"https://www.nature.com/articles/s41591-024-03214-0","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community. Incorporating a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations and to longitudinal analyses, an open-source software is proposed to standardize current electronic health record data processing and analysis pipelines.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于端到端分析电子健康记录数据的开源框架
随着全球医疗系统的逐步数字化,大规模收集电子健康记录(EHR)已成为普遍现象。然而,目前还缺少一个考虑到数据异质性的可扩展的综合探索性分析框架。ehrapy 包含一系列分析步骤,从数据提取和质量控制到生成低维表示。辅以丰富的统计模块,ehrapy 可帮助将患者与疾病状态关联起来、对患者集群进行差异比较、生存分析、轨迹推断、因果推断等。利用本体,ehrapy 还能进一步实现数据共享和训练 EHR 深度学习模型,为生物医学研究中的基础模型铺平道路。我们在六个不同的例子中展示了 ehrapy 的功能。我们应用 ehrapy 将不明肺炎患者分层为更精细的表型。此外,我们还揭示了这些群体间生存率显著差异的生物标志物。此外,我们还量化了肺炎药物对住院时间的影响。我们进一步利用 ehrapy 分析了不同数据模式下的心血管风险。我们根据成像数据重建了严重急性呼吸系统综合征冠状病毒 2(SARS-CoV-2)患者的疾病状态轨迹。最后,我们进行了一项案例研究,展示了 ehrapy 如何检测和减轻电子病历数据中的偏差。因此,我们认为 ehrapy 提供的框架将使电子病历数据的分析管道标准化,并成为社区的基石。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Nature Medicine
Nature Medicine 医学-生化与分子生物学
CiteScore
100.90
自引率
0.70%
发文量
525
审稿时长
1 months
期刊介绍: Nature Medicine is a monthly journal publishing original peer-reviewed research in all areas of medicine. The publication focuses on originality, timeliness, interdisciplinary interest, and the impact on improving human health. In addition to research articles, Nature Medicine also publishes commissioned content such as News, Reviews, and Perspectives. This content aims to provide context for the latest advances in translational and clinical research, reaching a wide audience of M.D. and Ph.D. readers. All editorial decisions for the journal are made by a team of full-time professional editors. Nature Medicine consider all types of clinical research, including: -Case-reports and small case series -Clinical trials, whether phase 1, 2, 3 or 4 -Observational studies -Meta-analyses -Biomarker studies -Public and global health studies Nature Medicine is also committed to facilitating communication between translational and clinical researchers. As such, we consider “hybrid” studies with preclinical and translational findings reported alongside data from clinical studies.
期刊最新文献
Limiting babies’ sugar intake protects them against chronic diseases Editorial Expression of Concern: Tumor-selective action of HDAC inhibitors involves TRAIL induction in acute myeloid leukemia cells Effects of tirzepatide on circulatory overload and end-organ damage in heart failure with preserved ejection fraction and obesity: a secondary analysis of the SUMMIT trial Author Correction: Duvelisib plus romidepsin in relapsed/refractory T cell lymphomas: a phase 1b/2a trial. Are GLP-1 drugs really for everybody?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1