Mehak Arora, Hassan Mortagy, Nathan Dwarshuis, Jeffrey Wang, Philip Yang, Andre L Holder, Swati Gupta, Rishikesan Kamaleswaran
Objective: To develop an electronic medical record (EMR) data processing tool that confers clinical context to machine learning (ML) algorithms for error handling, bias mitigation, and interpretability.
Materials and methods: We present Trust-MAPS, an algorithm that translates clinical domain knowledge into high-dimensional, mixed-integer programming models that capture physiological and biological constraints on clinical measurements. EMR data are projected onto this constrained space, effectively bringing outliers to fall within a physiologically feasible range. We then compute the distance of each data point from the constrained space modeling healthy physiology to quantify deviation from the norm. These distances, termed "trust-scores," are integrated into the feature space for downstream ML applications. We demonstrate the utility of Trust-MAPS by training a binary classifier for early sepsis prediction on data from the 2019 PhysioNet Computing in Cardiology Challenge, using the XGBoost algorithm and applying SMOTE for overcoming class-imbalance.
Results: The Trust-MAPS framework shows desirable behavior in handling potential errors and boosting predictive performance. We achieve an area under the receiver operating characteristic curve of 0.91 (95% CI, 0.89-0.92) for predicting sepsis 6 hours before onset-a marked 15% improvement over a baseline model trained without Trust-MAPS.
Discussions: Downstream classification performance improves after Trust-MAPS preprocessing, highlighting the bias reducing capabilities of the error-handling projections. Trust-scores emerge as clinically meaningful features that not only boost predictive performance for clinical decision support tasks but also lend interpretability to ML models.
Conclusion: This work is the first to translate clinical domain knowledge into mathematical constraints, model cross-vital dependencies, and identify aberrations in high-dimensional medical data. Our method allows for error handling in EMR and confers interpretability and superior predictive power to models trained for clinical decision support.
目的:开发一种电子病历(EMR)数据处理工具,将临床背景赋予机器学习(ML)算法,用于错误处理、偏见缓解和可解释性。材料和方法:我们提出Trust-MAPS,这是一种将临床领域知识转化为高维混合整数规划模型的算法,可以捕获临床测量的生理和生物限制。EMR数据被投射到这个受限的空间,有效地将异常值置于生理上可行的范围内。然后,我们计算每个数据点与健康生理模型约束空间的距离,以量化与规范的偏差。这些距离被称为“信任分数”,被集成到下游ML应用程序的特征空间中。我们利用2019年PhysioNet Computing in Cardiology Challenge的数据,训练一个用于早期败血症预测的二元分类器,并使用XGBoost算法和SMOTE来克服类别不平衡,从而展示了Trust-MAPS的实用性。结果:Trust-MAPS框架在处理潜在错误和提高预测性能方面表现出理想的行为。在发病前6小时预测败血症时,我们实现了受试者工作特征曲线下的面积为0.91 (95% CI, 0.89-0.92)——与未经Trust-MAPS训练的基线模型相比,显著提高了15%。讨论:在Trust-MAPS预处理后,下游分类性能得到改善,突出了错误处理预测的减少偏差的能力。信任分数作为临床有意义的特征出现,不仅提高了临床决策支持任务的预测性能,而且为ML模型提供了可解释性。结论:这项工作首次将临床领域知识转化为数学约束,建立跨生命依赖关系模型,并识别高维医疗数据中的畸变。我们的方法允许在电子病历中的错误处理,并赋予可解释性和卓越的预测能力模型训练临床决策支持。
{"title":"Improving clinical decision support through interpretable machine learning and error handling in electronic health records.","authors":"Mehak Arora, Hassan Mortagy, Nathan Dwarshuis, Jeffrey Wang, Philip Yang, Andre L Holder, Swati Gupta, Rishikesan Kamaleswaran","doi":"10.1093/jamia/ocaf058","DOIUrl":"10.1093/jamia/ocaf058","url":null,"abstract":"<p><strong>Objective: </strong>To develop an electronic medical record (EMR) data processing tool that confers clinical context to machine learning (ML) algorithms for error handling, bias mitigation, and interpretability.</p><p><strong>Materials and methods: </strong>We present Trust-MAPS, an algorithm that translates clinical domain knowledge into high-dimensional, mixed-integer programming models that capture physiological and biological constraints on clinical measurements. EMR data are projected onto this constrained space, effectively bringing outliers to fall within a physiologically feasible range. We then compute the distance of each data point from the constrained space modeling healthy physiology to quantify deviation from the norm. These distances, termed \"trust-scores,\" are integrated into the feature space for downstream ML applications. We demonstrate the utility of Trust-MAPS by training a binary classifier for early sepsis prediction on data from the 2019 PhysioNet Computing in Cardiology Challenge, using the XGBoost algorithm and applying SMOTE for overcoming class-imbalance.</p><p><strong>Results: </strong>The Trust-MAPS framework shows desirable behavior in handling potential errors and boosting predictive performance. We achieve an area under the receiver operating characteristic curve of 0.91 (95% CI, 0.89-0.92) for predicting sepsis 6 hours before onset-a marked 15% improvement over a baseline model trained without Trust-MAPS.</p><p><strong>Discussions: </strong>Downstream classification performance improves after Trust-MAPS preprocessing, highlighting the bias reducing capabilities of the error-handling projections. Trust-scores emerge as clinically meaningful features that not only boost predictive performance for clinical decision support tasks but also lend interpretability to ML models.</p><p><strong>Conclusion: </strong>This work is the first to translate clinical domain knowledge into mathematical constraints, model cross-vital dependencies, and identify aberrations in high-dimensional medical data. Our method allows for error handling in EMR and confers interpretability and superior predictive power to models trained for clinical decision support.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"123-132"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144003672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: The objective of this study is to provide an overview of the current landscape of individualized treatment effects (ITE) estimation, specifically focusing on methodologies proposed for time-series electronic health records (EHRs). We aim to identify gaps in the literature, discuss challenges, and propose future research directions to advance the field of personalized medicine.
Materials and methods: We conducted a comprehensive literature review to identify and analyze relevant works on ITE estimation for time-series data. The review focused on theoretical assumptions, types of treatment settings, and computational frameworks employed in the existing literature.
Results: The literature reveals a growing body of work on ITE estimation for tabular data, while methodologies specific to time-series EHRs are limited. We summarize and discuss the latest advancements, including the types of models proposed, the theoretical foundations, and the computational approaches used.
Discussion: The limitations and challenges of current ITE estimation methods for time-series data are discussed, including the lack of standardized evaluation metrics and the need for more diverse and representative datasets. We also highlight considerations and potential biases that may arise in personalized treatment effect estimation.
Conclusion: This work provides a comprehensive overview of ITE estimation for time-series EHR data, offering insights into the current state of the field and identifying future research directions. By addressing the limitations and challenges, we hope to encourage further exploration and innovation in this exciting and under-studied area of personalized medicine.
{"title":"A perspective on individualized treatment effects estimation from time-series health data.","authors":"Ghadeer O Ghosheh, Moritz Gögl, Tingting Zhu","doi":"10.1093/jamia/ocae323","DOIUrl":"10.1093/jamia/ocae323","url":null,"abstract":"<p><strong>Objectives: </strong>The objective of this study is to provide an overview of the current landscape of individualized treatment effects (ITE) estimation, specifically focusing on methodologies proposed for time-series electronic health records (EHRs). We aim to identify gaps in the literature, discuss challenges, and propose future research directions to advance the field of personalized medicine.</p><p><strong>Materials and methods: </strong>We conducted a comprehensive literature review to identify and analyze relevant works on ITE estimation for time-series data. The review focused on theoretical assumptions, types of treatment settings, and computational frameworks employed in the existing literature.</p><p><strong>Results: </strong>The literature reveals a growing body of work on ITE estimation for tabular data, while methodologies specific to time-series EHRs are limited. We summarize and discuss the latest advancements, including the types of models proposed, the theoretical foundations, and the computational approaches used.</p><p><strong>Discussion: </strong>The limitations and challenges of current ITE estimation methods for time-series data are discussed, including the lack of standardized evaluation metrics and the need for more diverse and representative datasets. We also highlight considerations and potential biases that may arise in personalized treatment effect estimation.</p><p><strong>Conclusion: </strong>This work provides a comprehensive overview of ITE estimation for time-series EHR data, offering insights into the current state of the field and identifying future research directions. By addressing the limitations and challenges, we hope to encourage further exploration and innovation in this exciting and under-studied area of personalized medicine.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"234-241"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143558469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Machine learning applications for longitudinal electronic health records often forecast the risk of events at fixed time points, whereas survival analysis achieves dynamic risk prediction by estimating time-to-event distributions. Here, we propose a novel conditional variational autoencoder-based method, DySurv, which uses a combination of static and longitudinal measurements from electronic health records to estimate the individual risk of death dynamically.
Materials and methods: DySurv directly estimates the cumulative risk incidence function without making any parametric assumptions on the underlying stochastic process of the time-to-event. We evaluate DySurv on 6 time-to-event benchmark datasets in healthcare, as well as 2 real-world intensive care unit (ICU) electronic health records (EHR) datasets extracted from the eICU Collaborative Research (eICU) and the Medical Information Mart for Intensive Care database (MIMIC-IV).
Results: DySurv outperforms other existing statistical and deep learning approaches to time-to-event analysis across concordance and other metrics. It achieves time-dependent concordance of over 60% in the eICU case. It is also over 12% more accurate and 22% more sensitive than in-use ICU scores like Acute Physiology and Chronic Health Evaluation (APACHE) and Sequential Organ Failure Assessment (SOFA) scores. The predictive capacity of DySurv is consistent and the survival estimates remain disentangled across different datasets.
Discussion: Our interdisciplinary framework successfully incorporates deep learning, survival analysis, and intensive care to create a novel method for time-to-event prediction from longitudinal health records. We test our method on several held-out test sets from a variety of healthcare datasets and compare it to existing in-use clinical risk scoring benchmarks.
Conclusion: While our method leverages non-parametric extensions to deep learning-guided estimations of the survival distribution, further deep learning paradigms could be explored.
{"title":"DySurv: dynamic deep learning model for survival analysis with conditional variational inference.","authors":"Munib Mesinovic, Peter Watkinson, Tingting Zhu","doi":"10.1093/jamia/ocae271","DOIUrl":"10.1093/jamia/ocae271","url":null,"abstract":"<p><strong>Objective: </strong>Machine learning applications for longitudinal electronic health records often forecast the risk of events at fixed time points, whereas survival analysis achieves dynamic risk prediction by estimating time-to-event distributions. Here, we propose a novel conditional variational autoencoder-based method, DySurv, which uses a combination of static and longitudinal measurements from electronic health records to estimate the individual risk of death dynamically.</p><p><strong>Materials and methods: </strong>DySurv directly estimates the cumulative risk incidence function without making any parametric assumptions on the underlying stochastic process of the time-to-event. We evaluate DySurv on 6 time-to-event benchmark datasets in healthcare, as well as 2 real-world intensive care unit (ICU) electronic health records (EHR) datasets extracted from the eICU Collaborative Research (eICU) and the Medical Information Mart for Intensive Care database (MIMIC-IV).</p><p><strong>Results: </strong>DySurv outperforms other existing statistical and deep learning approaches to time-to-event analysis across concordance and other metrics. It achieves time-dependent concordance of over 60% in the eICU case. It is also over 12% more accurate and 22% more sensitive than in-use ICU scores like Acute Physiology and Chronic Health Evaluation (APACHE) and Sequential Organ Failure Assessment (SOFA) scores. The predictive capacity of DySurv is consistent and the survival estimates remain disentangled across different datasets.</p><p><strong>Discussion: </strong>Our interdisciplinary framework successfully incorporates deep learning, survival analysis, and intensive care to create a novel method for time-to-event prediction from longitudinal health records. We test our method on several held-out test sets from a variety of healthcare datasets and compare it to existing in-use clinical risk scoring benchmarks.</p><p><strong>Conclusion: </strong>While our method leverages non-parametric extensions to deep learning-guided estimations of the survival distribution, further deep learning paradigms could be explored.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"112-122"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758469/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rion Brattig Correia, Jordan C Rozum, Leonard Cross, Jack Felag, Michael Gallant, Ziqi Guo, Bruce W Herr, Aehong Min, Jon Sanchez-Valle, Deborah Stungis Rocha, Alfonso Valencia, Xuan Wang, Katy Börner, Wendy Miller, Luis M Rocha
Objectives: Report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and clinicians in making decisions about self-management and care.
Materials and methods: myAURA rests on an unprecedented collection of epilepsy-relevant heterogeneous data resources, such as biomedical databases, social media, and electronic health records (EHRs). We use a patient-centered biomedical dictionary to link the collected data in a multilayer knowledge graph (KG) computed with a generalizable, open-source methodology.
Results: Our approach is based on a novel network sparsification method that uses the metric backbone of weighted graphs to discover important edges for inference, recommendation, and visualization. We demonstrate by studying drug-drug interaction from EHRs, extracting epilepsy-focused digital cohorts from social media, and generating a multilayer KG visualization. We also present our patient-centered design and pilot-testing of myAURA, including its user interface.
Discussion: The ability to search and explore myAURA's heterogeneous data sources in a single, sparsified, multilayer KG is highly useful for a range of epilepsy studies and stakeholder support.
Conclusion: Our stakeholder-driven, scalable approach to integrating traditional and nontraditional data sources enables both clinical discovery and data-powered patient self-management in epilepsy and can be generalized to other chronic conditions.
{"title":"myAURA: a personalized health library for epilepsy management via knowledge graph sparsification and visualization.","authors":"Rion Brattig Correia, Jordan C Rozum, Leonard Cross, Jack Felag, Michael Gallant, Ziqi Guo, Bruce W Herr, Aehong Min, Jon Sanchez-Valle, Deborah Stungis Rocha, Alfonso Valencia, Xuan Wang, Katy Börner, Wendy Miller, Luis M Rocha","doi":"10.1093/jamia/ocaf012","DOIUrl":"10.1093/jamia/ocaf012","url":null,"abstract":"<p><strong>Objectives: </strong>Report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and clinicians in making decisions about self-management and care.</p><p><strong>Materials and methods: </strong>myAURA rests on an unprecedented collection of epilepsy-relevant heterogeneous data resources, such as biomedical databases, social media, and electronic health records (EHRs). We use a patient-centered biomedical dictionary to link the collected data in a multilayer knowledge graph (KG) computed with a generalizable, open-source methodology.</p><p><strong>Results: </strong>Our approach is based on a novel network sparsification method that uses the metric backbone of weighted graphs to discover important edges for inference, recommendation, and visualization. We demonstrate by studying drug-drug interaction from EHRs, extracting epilepsy-focused digital cohorts from social media, and generating a multilayer KG visualization. We also present our patient-centered design and pilot-testing of myAURA, including its user interface.</p><p><strong>Discussion: </strong>The ability to search and explore myAURA's heterogeneous data sources in a single, sparsified, multilayer KG is highly useful for a range of epilepsy studies and stakeholder support.</p><p><strong>Conclusion: </strong>Our stakeholder-driven, scalable approach to integrating traditional and nontraditional data sources enables both clinical discovery and data-powered patient self-management in epilepsy and can be generalized to other chronic conditions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"167-181"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758476/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabrielle Bunney, Kate Miller, Anna Graber-Naidich, Rana Kabeer, Sean M Bloos, Alexander J Wessels, Melissa A Pasao, Marium Rizvi, Ian P Brown, Maame Yaa A B Yiadom
Objective: The integration of predictive models into live clinical care requires scientific testing before implementation to ensure patient safety. We built and technically implemented a model that predicts which patients require an electrocardiogram (ECG) to screen for heart attacks within 10 minutes of their arrival to the Emergency Department. We developed a structured framework for the in vitro to in vivo translation of the model through implementation as clinical decision support (CDS).
Materials and methods: The CDS ran as a silent pilot for 2 months. We conducted (1) a Technical Component Analysis to ensure each part of the CDS coding functioned as planned, and (2) a Technical Fidelity Analysis to ensure agreement between the CDS's in vivo and the model's in vitro screening decisions.
Results: The Technical Component Analysis indicated several small coding errors in CDS components that were addressed. During this period, the CDS processed 18 335 patient encounters. CDS fidelity to the model reflected raw agreement of 95.5% (CI, 95.2%-95.9%) and kappa of 87.6% (CI, 86.7%-88.6%). Additional coding errors were identified and were corrected.
Discussion: Our structured framework for the in vitro to in vivo translation of our predictive model uncovered ways to improve performance in vivo and the validity of risk assessment decisions. Testing predictive models on live care data and accompanying analyses is necessary to safely implement a predictive model for clinical use.
Conclusion: We developed a method for the translation of our model from in vitro to in vivo that can be utilized with other applications of predictive modeling in healthcare.
{"title":"In vitro to in vivo translation of artificial intelligence for clinical use: screening for acute coronary syndrome to identify ST-elevation myocardial infarction.","authors":"Gabrielle Bunney, Kate Miller, Anna Graber-Naidich, Rana Kabeer, Sean M Bloos, Alexander J Wessels, Melissa A Pasao, Marium Rizvi, Ian P Brown, Maame Yaa A B Yiadom","doi":"10.1093/jamia/ocaf101","DOIUrl":"10.1093/jamia/ocaf101","url":null,"abstract":"<p><strong>Objective: </strong>The integration of predictive models into live clinical care requires scientific testing before implementation to ensure patient safety. We built and technically implemented a model that predicts which patients require an electrocardiogram (ECG) to screen for heart attacks within 10 minutes of their arrival to the Emergency Department. We developed a structured framework for the in vitro to in vivo translation of the model through implementation as clinical decision support (CDS).</p><p><strong>Materials and methods: </strong>The CDS ran as a silent pilot for 2 months. We conducted (1) a Technical Component Analysis to ensure each part of the CDS coding functioned as planned, and (2) a Technical Fidelity Analysis to ensure agreement between the CDS's in vivo and the model's in vitro screening decisions.</p><p><strong>Results: </strong>The Technical Component Analysis indicated several small coding errors in CDS components that were addressed. During this period, the CDS processed 18 335 patient encounters. CDS fidelity to the model reflected raw agreement of 95.5% (CI, 95.2%-95.9%) and kappa of 87.6% (CI, 86.7%-88.6%). Additional coding errors were identified and were corrected.</p><p><strong>Discussion: </strong>Our structured framework for the in vitro to in vivo translation of our predictive model uncovered ways to improve performance in vivo and the validity of risk assessment decisions. Testing predictive models on live care data and accompanying analyses is necessary to safely implement a predictive model for clinical use.</p><p><strong>Conclusion: </strong>We developed a method for the translation of our model from in vitro to in vivo that can be utilized with other applications of predictive modeling in healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"7-14"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758466/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc
Objectives: Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.
Materials and methods: For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.
Results: We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.
Discussion and conclusion: By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.
{"title":"Transport-based transfer learning on Electronic Health Records: application to detection of treatment disparities.","authors":"Wanxin Li, Saad Ahmed, Yongjin P Park, Khanh Dao Duc","doi":"10.1093/jamia/ocaf134","DOIUrl":"10.1093/jamia/ocaf134","url":null,"abstract":"<p><strong>Objectives: </strong>Electronic Health Records (EHRs) sampled from different populations can introduce unwanted biases, limit individual-level data sharing, and make the data and fitted model hardly transferable across different population groups. In this context, our main goal is to design an effective method to transfer knowledge between population groups, with computable guarantees for suitability, and that can be applied to quantify treatment disparities.</p><p><strong>Materials and methods: </strong>For a model trained in an embedded feature space of one subgroup, our proposed framework, Optimal Transport-based Transfer Learning for EHRs (OTTEHR), combines feature embedding of the data and unbalanced optimal transport (OT) for domain adaptation to another population group. To test our method, we processed and divided the MIMIC-III and MIMIC-IV databases into multiple population groups using ICD codes and multiple labels.</p><p><strong>Results: </strong>We derive a theoretical bound for the generalization error of our method, and interpret it in terms of the Wasserstein distance, unbalancedness between the source and target domains, and labeling divergence, which can be used as a guide for assessing the suitability of binary classification and regression tasks. In general, our method achieves better accuracy and computational efficiency compared with standard and machine learning transfer learning methods on various tasks. Upon testing our method for populations with different insurance plans, we detect various levels of disparities in hospital duration stay between groups.</p><p><strong>Discussion and conclusion: </strong>By leveraging tools from OT theory, our proposed framework allows to compare statistical models on EHR data between different population groups. As a potential application for clinical decision making, we quantify treatment disparities between different population groups. Future directions include applying OTTEHR to broader regression and classification tasks and extending the method to semi-supervised learning.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"15-25"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758479/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Noll, Alexandra Berger, Carlo Facchinello, Katharina Stratmann, Jannik Schaaf, Holger Storf
Objective: This study aims to enhance the diagnostic process for rare diseases using case-based reasoning (CBR). CBR compares new cases with historical data, utilizing both structured and unstructured clinical data.
Materials and methods: The study uses a dataset of 4295 patient cases from the University Hospital Frankfurt. Data were standardized using the OMOP Common Data Model. Three methods-TF, TF-IDF, and TF-IDF with semantic vector embeddings-were employed to represent patient records. Similarity search effectiveness was evaluated using cross-validation to assess diagnostic precision. High-weighted concepts were rated by medical experts for relevance. Additionally, the impact of different levels of ICD-10 code granularity on prediction outcomes was analyzed.
Results: The TF-IDF method showed a high degree of precision, with an average positive predictive value of 91% in the 10 most similar cases. The differences between the methods were not statistically significant. The expert evaluation rated the medical relevance of high-weighted concepts as moderate. The granularity of ICD-10 coding significantly influences the precision of predictions, with more granular codes showing decreased precision.
Discussion: The methods effectively handle data from multiple medical specialties, suggesting broad applicability. The use of broader ICD-10 codes with high precision in prediction could improve initial diagnostic guidance. The use of Explainable AI could enhance diagnostic transparency, leading to better patient outcomes. Limitations include standardization issues and the need for more comprehensive lab value integration.
Conclusion: While CBR shows promise for rare disease diagnostics, its utility depends on the specific needs of the decision support system and its intended clinical application.
{"title":"Enhancing diagnostic precision for rare diseases using case-based reasoning.","authors":"Richard Noll, Alexandra Berger, Carlo Facchinello, Katharina Stratmann, Jannik Schaaf, Holger Storf","doi":"10.1093/jamia/ocaf092","DOIUrl":"10.1093/jamia/ocaf092","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to enhance the diagnostic process for rare diseases using case-based reasoning (CBR). CBR compares new cases with historical data, utilizing both structured and unstructured clinical data.</p><p><strong>Materials and methods: </strong>The study uses a dataset of 4295 patient cases from the University Hospital Frankfurt. Data were standardized using the OMOP Common Data Model. Three methods-TF, TF-IDF, and TF-IDF with semantic vector embeddings-were employed to represent patient records. Similarity search effectiveness was evaluated using cross-validation to assess diagnostic precision. High-weighted concepts were rated by medical experts for relevance. Additionally, the impact of different levels of ICD-10 code granularity on prediction outcomes was analyzed.</p><p><strong>Results: </strong>The TF-IDF method showed a high degree of precision, with an average positive predictive value of 91% in the 10 most similar cases. The differences between the methods were not statistically significant. The expert evaluation rated the medical relevance of high-weighted concepts as moderate. The granularity of ICD-10 coding significantly influences the precision of predictions, with more granular codes showing decreased precision.</p><p><strong>Discussion: </strong>The methods effectively handle data from multiple medical specialties, suggesting broad applicability. The use of broader ICD-10 codes with high precision in prediction could improve initial diagnostic guidance. The use of Explainable AI could enhance diagnostic transparency, leading to better patient outcomes. Limitations include standardization issues and the need for more comprehensive lab value integration.</p><p><strong>Conclusion: </strong>While CBR shows promise for rare disease diagnostics, its utility depends on the specific needs of the decision support system and its intended clinical application.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"98-111"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Accurately measuring patient similarity is essential for precision medicine, enabling personalized predictive modeling, disease subtyping, and individualized treatment by identifying patients with similar characteristics to an index patient. This study aims to develop an electronic health record-based patient similarity estimation framework to enhance personalized predictive modeling for Acute Kidney Injury (AKI), a complex and life-threatening condition where accurate prediction is critical for timely intervention.
Materials and methods: We introduce Similarity Measurement for Acute Kidney Injury Risk Tracking (SMART), a new patient similarity estimation framework with 3 key enhancements: (1) overlap weighting to adjust similarity scores; (2) distance measure optimization; and (3) feature type weight optimization. These enhancements were evaluated using internal and external validation datasets from 2 tertiary academic hospitals to predict AKI risk across varying group sizes of similar patients.
Results: The study analyzed data from 8637 patients in the reference patient pool and 8542 patients in each of the internal and external test sets. Each enhancement was independently evaluated while controlling for other variables to determine its impact on prediction performance. SMART consistently outperformed 3 baseline models on both the internal and external test sets (P<.05) and demonstrated improved performance in certain subpopulations with unique health profiles compared to a traditional machine learning approach.
Discussion: SMART improves the identification of high-quality similar patient groups, enhancing the accuracy of personalized AKI prediction across various group sizes. By accurately identifying clinically relevant similar patients, clinicians can tailor treatments more effectively, advancing personalized care.
{"title":"SMART: a new patient similarity estimation framework for enhanced predictive modeling in acute kidney injury.","authors":"Deyi Li, Alan S L Yu, Dana Y Fuhrman, Mei Liu","doi":"10.1093/jamia/ocaf125","DOIUrl":"10.1093/jamia/ocaf125","url":null,"abstract":"<p><strong>Objective: </strong>Accurately measuring patient similarity is essential for precision medicine, enabling personalized predictive modeling, disease subtyping, and individualized treatment by identifying patients with similar characteristics to an index patient. This study aims to develop an electronic health record-based patient similarity estimation framework to enhance personalized predictive modeling for Acute Kidney Injury (AKI), a complex and life-threatening condition where accurate prediction is critical for timely intervention.</p><p><strong>Materials and methods: </strong>We introduce Similarity Measurement for Acute Kidney Injury Risk Tracking (SMART), a new patient similarity estimation framework with 3 key enhancements: (1) overlap weighting to adjust similarity scores; (2) distance measure optimization; and (3) feature type weight optimization. These enhancements were evaluated using internal and external validation datasets from 2 tertiary academic hospitals to predict AKI risk across varying group sizes of similar patients.</p><p><strong>Results: </strong>The study analyzed data from 8637 patients in the reference patient pool and 8542 patients in each of the internal and external test sets. Each enhancement was independently evaluated while controlling for other variables to determine its impact on prediction performance. SMART consistently outperformed 3 baseline models on both the internal and external test sets (P<.05) and demonstrated improved performance in certain subpopulations with unique health profiles compared to a traditional machine learning approach.</p><p><strong>Discussion: </strong>SMART improves the identification of high-quality similar patient groups, enhancing the accuracy of personalized AKI prediction across various group sizes. By accurately identifying clinically relevant similar patients, clinicians can tailor treatments more effectively, advancing personalized care.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"37-48"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura
Objective: We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.
Materials and methods: Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).
Results: We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.
Discussion: LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.
Conclusion: Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.
{"title":"Using large language models to detect outcomes in qualitative studies of adolescent depression.","authors":"Alison W Xin, Dylan M Nielson, Karolin Rose Krause, Guilherme Fiorini, Nick Midgley, Francisco Pereira, Juan Antonio Lossio-Ventura","doi":"10.1093/jamia/ocae298","DOIUrl":"10.1093/jamia/ocae298","url":null,"abstract":"<p><strong>Objective: </strong>We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.</p><p><strong>Materials and methods: </strong>Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).</p><p><strong>Results: </strong>We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.</p><p><strong>Discussion: </strong>LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.</p><p><strong>Conclusion: </strong>Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"79-89"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758459/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seth Russell, Peter E DeWitt, Laura Helmkamp, Kathryn Colborn, Charlotte Gray, Margaret Rebull, Yamila L Sierra, Rachel Greer, Lexi Petruccelli, Sara Shankman, Todd C Hankinson, Fuyong Xing, David J Albers, Tellen D Bennett
Objective: Clinicians currently make decisions about placing an intracranial pressure (ICP) monitor in children with traumatic brain injury (TBI) without the benefit of an accurate clinical decision support tool. The goal of this study was to develop and validate a model that predicts placement of an ICP monitor and updates as new information becomes available.
Materials and methods: A prospective observational cohort study was conducted from September 2014 to January 2024. The setting included one US hospital designated as an American College of Surgeons Level 1 Pediatric Trauma Center. Participants were 389 children with acute TBI admitted to the ICU who had at least one Glasgow Coma Scale (GCS) score ≤ 8 or intubation with at least one GCS-Motor ≤ 5. We excluded children who received ICP monitors prior to arrival, those with GCS = 3 and bilateral fixed, dilated pupils, and those with a do not resuscitate order.
Results: Of the 389 participants, 138 received ICP monitoring. Several machine learning models, including a recurrent neural network (RNN), were developed and validated using 4 combinations of input data. The best performing model, an RNN, achieved an F1 of 0.71 within 720 minutes of hospital arrival. The cumulative F1 of the RNN from minute 0 to 720 was 0.61. The best performing non-neural network model, standard logistic regression, achieved an F1 of 0.36 within 720 minutes of hospital arrival.
Conclusions: These findings will contribute to design and implementation of a multidisciplinary clinical decision support tool for ICP monitor placement in children with TBI.
{"title":"Predicting intracranial pressure monitor placement in children with traumatic brain injury: a prospective cohort study to develop a clinical decision support tool.","authors":"Seth Russell, Peter E DeWitt, Laura Helmkamp, Kathryn Colborn, Charlotte Gray, Margaret Rebull, Yamila L Sierra, Rachel Greer, Lexi Petruccelli, Sara Shankman, Todd C Hankinson, Fuyong Xing, David J Albers, Tellen D Bennett","doi":"10.1093/jamia/ocaf120","DOIUrl":"10.1093/jamia/ocaf120","url":null,"abstract":"<p><strong>Objective: </strong>Clinicians currently make decisions about placing an intracranial pressure (ICP) monitor in children with traumatic brain injury (TBI) without the benefit of an accurate clinical decision support tool. The goal of this study was to develop and validate a model that predicts placement of an ICP monitor and updates as new information becomes available.</p><p><strong>Materials and methods: </strong>A prospective observational cohort study was conducted from September 2014 to January 2024. The setting included one US hospital designated as an American College of Surgeons Level 1 Pediatric Trauma Center. Participants were 389 children with acute TBI admitted to the ICU who had at least one Glasgow Coma Scale (GCS) score ≤ 8 or intubation with at least one GCS-Motor ≤ 5. We excluded children who received ICP monitors prior to arrival, those with GCS = 3 and bilateral fixed, dilated pupils, and those with a do not resuscitate order.</p><p><strong>Results: </strong>Of the 389 participants, 138 received ICP monitoring. Several machine learning models, including a recurrent neural network (RNN), were developed and validated using 4 combinations of input data. The best performing model, an RNN, achieved an F1 of 0.71 within 720 minutes of hospital arrival. The cumulative F1 of the RNN from minute 0 to 720 was 0.61. The best performing non-neural network model, standard logistic regression, achieved an F1 of 0.36 within 720 minutes of hospital arrival.</p><p><strong>Conclusions: </strong>These findings will contribute to design and implementation of a multidisciplinary clinical decision support tool for ICP monitor placement in children with TBI.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"182-192"},"PeriodicalIF":4.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12758473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144762166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}