Improving Risk Prediction of Methicillin-Resistant Staphylococcus aureus Using Machine Learning Methods With Network Features: Retrospective Development Study.

JMIR AI Pub Date : 2024-05-16 DOI:10.2196/48067

Methun Kamruzzaman, Jack Heavey, Alexander Song, Matthew Bielskas, Parantapa Bhattacharya, Gregory Madden, Eili Klein, Xinwei Deng, Anil Vullikanti

{"title":"Improving Risk Prediction of Methicillin-Resistant Staphylococcus aureus Using Machine Learning Methods With Network Features: Retrospective Development Study.","authors":"Methun Kamruzzaman, Jack Heavey, Alexander Song, Matthew Bielskas, Parantapa Bhattacharya, Gregory Madden, Eili Klein, Xinwei Deng, Anil Vullikanti","doi":"10.2196/48067","DOIUrl":null,"url":null,"abstract":"Background: Health care-associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure.Objective: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage.Methods: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient's EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients' contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better.Results: We found that the penalized logistic regression performs better than other methods, and this model's performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient's comorbidity conditions, and network features. Among these, network features add the most value and improve the model's performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.Conclusions: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model's performance.","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e48067"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11140275/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/48067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Health care-associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure.

Objective: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage.

Methods: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient's EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients' contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better.

Results: We found that the penalized logistic regression performs better than other methods, and this model's performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient's comorbidity conditions, and network features. Among these, network features add the most value and improve the model's performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations.

Conclusions: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model's performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用具有网络特征的机器学习方法改进耐甲氧西林金黄色葡萄球菌的风险预测：回顾性发展研究。

背景：耐甲氧西林金黄色葡萄球菌（MRSA）和难辨梭状芽孢杆菌（CDI）等耐多药菌（MDRO）引起的医疗相关感染给我们的医疗基础设施带来了沉重负担：目的：筛查 MDROs 是防止传播的重要机制，但需要耗费大量资源。本研究的目的是开发自动化工具，利用电子健康记录（EHR）数据预测定植或感染风险，提供有用信息帮助感染控制，并指导经验性抗生素的使用范围：我们回顾性地开发了一个机器学习模型，用于检测弗吉尼亚大学医院住院患者样本采集时未分化患者的 MRSA 定植和感染情况。我们使用了从患者电子病历数据中的入院和整个住院期间信息中提取的临床和非临床特征来构建模型。此外，我们还使用了一类从电子病历数据中的联系网络中提取的特征；这些网络特征可以捕捉患者与医疗服务提供者和其他患者的联系，从而提高模型的可解释性和准确性，以预测 MRSA 监测检验的结果。最后，我们探索了针对不同患者亚群的异构模型，例如，在重症监护室或急诊科住院的患者或有特定检测史的患者，哪种模型表现更好：我们发现，惩罚逻辑回归比其他方法表现更好，当我们对特征进行多项式（二度）变换时，该模型的接收者操作特征曲线下面积得分的表现提高了近 11%。预测 MDRO 风险的一些重要特征包括抗生素的使用、手术、器械的使用、透析、患者的合并症情况以及网络特征。其中，网络特征的价值最大，至少提高了模型性能的 15%。对于特定的患者亚群，具有相同特征变换的惩罚逻辑回归模型也比其他模型表现更好：我们的研究表明，利用从电子病历数据中提取的临床和非临床特征，机器学习方法可以相当有效地进行 MRSA 风险预测。网络特征最具预测性，与之前的方法相比有显著改善。此外，针对不同患者亚群的异构预测模型也提高了模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JMIR AI

自引率

0.00%

发文量