Patrick Rockenschaub, Adam Hilbert, Tabea Kossen, Paul Elbers, Falk von Dincklage, Vince Istvan Madai, Dietmar Frey
{"title":"多机构数据集对重症监护病房机器学习预测模型通用性的影响。","authors":"Patrick Rockenschaub, Adam Hilbert, Tabea Kossen, Paul Elbers, Falk von Dincklage, Vince Istvan Madai, Dietmar Frey","doi":"10.1097/CCM.0000000000006359","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the transferability of deep learning (DL) models for the early detection of adverse events to previously unseen hospitals.</p><p><strong>Design: </strong>Retrospective observational cohort study utilizing harmonized intensive care data from four public datasets.</p><p><strong>Setting: </strong>ICUs across Europe and the United States.</p><p><strong>Patients: </strong>Adult patients admitted to the ICU for at least 6 hours who had good data quality.</p><p><strong>Interventions: </strong>None.</p><p><strong>Measurements and main results: </strong>Using carefully harmonized data from a total of 334,812 ICU stays, we systematically assessed the transferability of DL models for three common adverse events: death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or algorithmically optimizing for generalizability during training improves model performance at new hospitals. We found that models achieved high area under the receiver operating characteristic (AUROC) for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, AUROC dropped when models were applied at other hospitals, sometimes by as much as -0.200. Using more than one dataset for training mitigated the performance drop, with multicenter models performing roughly on par with the best single-center model. Dedicated methods promoting generalizability did not noticeably improve performance in our experiments.</p><p><strong>Conclusions: </strong>Our results emphasize the importance of diverse training data for DL-based risk prediction. They suggest that as data from more hospitals become available for training, models may become increasingly generalizable. Even so, good performance at a new hospital still depended on the inclusion of compatible hospitals during training.</p>","PeriodicalId":10765,"journal":{"name":"Critical Care Medicine","volume":" ","pages":"1710-1721"},"PeriodicalIF":7.7000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469625/pdf/","citationCount":"0","resultStr":"{\"title\":\"The Impact of Multi-Institution Datasets on the Generalizability of Machine Learning Prediction Models in the ICU.\",\"authors\":\"Patrick Rockenschaub, Adam Hilbert, Tabea Kossen, Paul Elbers, Falk von Dincklage, Vince Istvan Madai, Dietmar Frey\",\"doi\":\"10.1097/CCM.0000000000006359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objectives: </strong>To evaluate the transferability of deep learning (DL) models for the early detection of adverse events to previously unseen hospitals.</p><p><strong>Design: </strong>Retrospective observational cohort study utilizing harmonized intensive care data from four public datasets.</p><p><strong>Setting: </strong>ICUs across Europe and the United States.</p><p><strong>Patients: </strong>Adult patients admitted to the ICU for at least 6 hours who had good data quality.</p><p><strong>Interventions: </strong>None.</p><p><strong>Measurements and main results: </strong>Using carefully harmonized data from a total of 334,812 ICU stays, we systematically assessed the transferability of DL models for three common adverse events: death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or algorithmically optimizing for generalizability during training improves model performance at new hospitals. We found that models achieved high area under the receiver operating characteristic (AUROC) for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, AUROC dropped when models were applied at other hospitals, sometimes by as much as -0.200. Using more than one dataset for training mitigated the performance drop, with multicenter models performing roughly on par with the best single-center model. Dedicated methods promoting generalizability did not noticeably improve performance in our experiments.</p><p><strong>Conclusions: </strong>Our results emphasize the importance of diverse training data for DL-based risk prediction. They suggest that as data from more hospitals become available for training, models may become increasingly generalizable. Even so, good performance at a new hospital still depended on the inclusion of compatible hospitals during training.</p>\",\"PeriodicalId\":10765,\"journal\":{\"name\":\"Critical Care Medicine\",\"volume\":\" \",\"pages\":\"1710-1721\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469625/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Critical Care Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/CCM.0000000000006359\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/3 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CRITICAL CARE MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Critical Care Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/CCM.0000000000006359","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CRITICAL CARE MEDICINE","Score":null,"Total":0}
The Impact of Multi-Institution Datasets on the Generalizability of Machine Learning Prediction Models in the ICU.
Objectives: To evaluate the transferability of deep learning (DL) models for the early detection of adverse events to previously unseen hospitals.
Design: Retrospective observational cohort study utilizing harmonized intensive care data from four public datasets.
Setting: ICUs across Europe and the United States.
Patients: Adult patients admitted to the ICU for at least 6 hours who had good data quality.
Interventions: None.
Measurements and main results: Using carefully harmonized data from a total of 334,812 ICU stays, we systematically assessed the transferability of DL models for three common adverse events: death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or algorithmically optimizing for generalizability during training improves model performance at new hospitals. We found that models achieved high area under the receiver operating characteristic (AUROC) for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, AUROC dropped when models were applied at other hospitals, sometimes by as much as -0.200. Using more than one dataset for training mitigated the performance drop, with multicenter models performing roughly on par with the best single-center model. Dedicated methods promoting generalizability did not noticeably improve performance in our experiments.
Conclusions: Our results emphasize the importance of diverse training data for DL-based risk prediction. They suggest that as data from more hospitals become available for training, models may become increasingly generalizable. Even so, good performance at a new hospital still depended on the inclusion of compatible hospitals during training.
期刊介绍:
Critical Care Medicine is the premier peer-reviewed, scientific publication in critical care medicine. Directed to those specialists who treat patients in the ICU and CCU, including chest physicians, surgeons, pediatricians, pharmacists/pharmacologists, anesthesiologists, critical care nurses, and other healthcare professionals, Critical Care Medicine covers all aspects of acute and emergency care for the critically ill or injured patient.
Each issue presents critical care practitioners with clinical breakthroughs that lead to better patient care, the latest news on promising research, and advances in equipment and techniques.