meds_reader: A fast and efficient EHR processing library

arXiv - CS - Databases Pub Date : 2024-09-12 DOI:arxiv-2409.09095

Ethan Steinberg, Michael Wornow, Suhana Bedi, Jason Alan Fries, Matthew B. A. McDermott, Nigam H. Shah

引用次数: 0

Abstract

The growing demand for machine learning in healthcare requires processing increasingly large electronic health record (EHR) datasets, but existing pipelines are not computationally efficient or scalable. In this paper, we introduce meds_reader, an optimized Python package for efficient EHR data processing that is designed to take advantage of many intrinsic properties of EHR data for improved speed. We then demonstrate the benefits of meds_reader by reimplementing key components of two major EHR processing pipelines, achieving 10-100x improvements in memory, speed, and disk usage. The code for meds_reader can be found at https://github.com/som-shahlab/meds_reader.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

meds_reader：快速高效的电子病历处理库

医疗保健领域对机器学习的需求日益增长，需要处理越来越大的电子健康记录（EHR）数据集，但现有的管道在计算效率和可扩展性方面都不尽如人意。在本文中，我们介绍了 meds_reader，这是一个用于高效处理电子病历数据的优化 Python 软件包，旨在利用电子病历数据的许多固有属性来提高处理速度。然后，我们通过对两个主要电子病历处理流水线关键组件的重新实施，展示了 meds_reader 的优势，在内存、速度和磁盘使用方面实现了 10-100 倍的改进。meds_reader 的代码见 https://github.com/som-shahlab/meds_reader。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Databases

自引率

0.00%

发文量

期刊最新文献

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code! Fast and Adaptive Bulk Loading of Multidimensional Points Matrix Profile for Anomaly Detection on Multidimensional Time Series Extending predictive process monitoring for collaborative processes