Jiawei Luo , Shixin Huang , Lan Lan , Shu Yang , Tingqian Cao , Jin Yin , Jiajun Qiu , Xiaoyan Yang , Yingqiang Guo , Xiaobo Zhou
{"title":"EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records","authors":"Jiawei Luo , Shixin Huang , Lan Lan , Shu Yang , Tingqian Cao , Jin Yin , Jiajun Qiu , Xiaoyan Yang , Yingqiang Guo , Xiaobo Zhou","doi":"10.1016/j.cmpb.2024.108521","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the <strong><u>E</u></strong>lectronic <strong><u>M</u></strong>edical <strong><u>R</u></strong>ecord <strong><u>L</u></strong>ongitudinal <strong><u>I</u></strong>rregular Data <strong><u>P</u></strong>reprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability.</div></div><div><h3>Materials and Methods</h3><div>EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks.</div></div><div><h3>Results</h3><div>In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups.</div></div><div><h3>Conclusion</h3><div>EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"259 ","pages":"Article 108521"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260724005145","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability.
Materials and Methods
EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks.
Results
In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups.
Conclusion
EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.