Pub Date : 2025-02-15DOI: 10.1038/s41746-025-01499-0
Raissa Souza, Emma A. M. Stanley, Anthony J. Winder, Chris Kang, Kimberly Amador, Erik Y. Ohara, Gabrielle Dagasso, Richard Camicioli, Oury Monchi, Zahinoor Ismail, Matthias Wilms, Nils D. Forkert
Distributed learning enables collaborative machine learning model training without requiring cross-institutional data sharing, thereby addressing privacy concerns. However, local quality control variability can negatively impact model performance while systematic human visual inspection is time-consuming and may violate the goal of keeping data inaccessible outside acquisition centers. This work proposes a novel self-supervised method to identify and eliminate harmful data during distributed learning model training fully-automatically. Harmful data is defined as samples that, when included in training, increase misdiagnosis rates. The method was tested using neuroimaging data from 83 centers for Parkinson’s disease classification with simulated inclusion of a few harmful data samples. The proposed method reliably identified harmful images, with centers providing only harmful datasets being easier to identify than single harmful images within otherwise good datasets. While only evaluated using neuroimaging data, the presented method is application-agnostic and presents a step towards automated quality control in distributed learning.
{"title":"Self-supervised identification and elimination of harmful datasets in distributed machine learning for medical image analysis","authors":"Raissa Souza, Emma A. M. Stanley, Anthony J. Winder, Chris Kang, Kimberly Amador, Erik Y. Ohara, Gabrielle Dagasso, Richard Camicioli, Oury Monchi, Zahinoor Ismail, Matthias Wilms, Nils D. Forkert","doi":"10.1038/s41746-025-01499-0","DOIUrl":"https://doi.org/10.1038/s41746-025-01499-0","url":null,"abstract":"<p>Distributed learning enables collaborative machine learning model training without requiring cross-institutional data sharing, thereby addressing privacy concerns. However, local quality control variability can negatively impact model performance while systematic human visual inspection is time-consuming and may violate the goal of keeping data inaccessible outside acquisition centers. This work proposes a novel self-supervised method to identify and eliminate harmful data during distributed learning model training fully-automatically. Harmful data is defined as samples that, when included in training, increase misdiagnosis rates. The method was tested using neuroimaging data from 83 centers for Parkinson’s disease classification with simulated inclusion of a few harmful data samples. The proposed method reliably identified harmful images, with centers providing only harmful datasets being easier to identify than single harmful images within otherwise good datasets. While only evaluated using neuroimaging data, the presented method is application-agnostic and presents a step towards automated quality control in distributed learning.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"79 6 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-15DOI: 10.1038/s41746-025-01500-w
Chen Chen, David C. Brown, Noor Al-Hammadi, Sayeh Bayat, Anne Dickerson, Brenda Vrkljan, Matthew Blake, Yiqi Zhu, Jean-Francois Trani, Eric J. Lenze, David B. Carr, Ganesh M. Babulal
Depression in older adults is often underdiagnosed and has been linked to adverse outcomes, including motor vehicle crashes. With a growing population of older drivers in the United States, innovations in screening methods are needed to identify older adults at greatest risk of decline. This study used machine learning techniques to analyze real-world naturalistic driving data to identify depression status in older adults and examined whether specific demographics and medications improved model performance. We analyzed two years of GPS data from 157 older adults, including 81 with major depressive disorder, using XGBoost and logistic regression models. The top-performing model achieved an area under the curve of 0.86 with driving features combined with total medication use. These findings suggest that naturalistic driving data holds high potential as a functional digital neurobehavioral marker for AI identifying depression in older adults on a national scale, thereby ensuring equitable access to treatment.
{"title":"Identifying major depressive disorder in older adults through naturalistic driving behaviors and machine learning","authors":"Chen Chen, David C. Brown, Noor Al-Hammadi, Sayeh Bayat, Anne Dickerson, Brenda Vrkljan, Matthew Blake, Yiqi Zhu, Jean-Francois Trani, Eric J. Lenze, David B. Carr, Ganesh M. Babulal","doi":"10.1038/s41746-025-01500-w","DOIUrl":"https://doi.org/10.1038/s41746-025-01500-w","url":null,"abstract":"<p>Depression in older adults is often underdiagnosed and has been linked to adverse outcomes, including motor vehicle crashes. With a growing population of older drivers in the United States, innovations in screening methods are needed to identify older adults at greatest risk of decline. This study used machine learning techniques to analyze real-world naturalistic driving data to identify depression status in older adults and examined whether specific demographics and medications improved model performance. We analyzed two years of GPS data from 157 older adults, including 81 with major depressive disorder, using XGBoost and logistic regression models. The top-performing model achieved an area under the curve of 0.86 with driving features combined with total medication use. These findings suggest that naturalistic driving data holds high potential as a functional digital neurobehavioral marker for AI identifying depression in older adults on a national scale, thereby ensuring equitable access to treatment.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"9 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-15DOI: 10.1038/s41746-025-01496-3
Nicolas Coudray, Michelle C Juarez, Maressa C Criscito, Adalberto Claudio Quiros, Reason Wilken, Stephanie R Jackson Cullison, Mary L Stevenson, Nicole A Doudican, Ke Yuan, Jamie D Aquino, Daniel M Klufas, Jeffrey P North, Siegrid S Yu, Fadi Murad, Emily Ruiz, Chrysalyne D Schmults, Cristian D Cardona Machado, Javier Cañueto, Anirudh Choudhary, Alysia N Hughes, Alyssa Stockard, Zachary Leibovit-Reiben, Aaron R Mangold, Aristotelis Tsirigos, John A Carucci
Primary cutaneous squamous cell carcinoma (cSCC) is responsible for ~10,000 deaths annually in the United States. Stratification of risk of poor outcome at initial biopsy would significantly impact clinical decision-making during the initial post operative period where intervention has been shown to be most effective. Using whole-slide images (WSI) from 163 patients from 3 institutions, we developed a self supervised deep-learning model to predict poor outcomes in cSCC patients from histopathological features at initial diagnosis, and validated it using WSI from 563 patients, collected from two other academic institutions. For disease-free survival prediction, the model attained a concordance index of 0.73 in the development cohort and 0.84 in the Mayo cohort. The model's interpretability revealed that features like poor differentiation and deep invasion were strongly associated with poor prognosis. Furthermore, the model is effective in stratifying risk among BWH T2a and AJCC T2, known for outcome heterogeneity.
{"title":"Self supervised artificial intelligence predicts poor outcome from primary cutaneous squamous cell carcinoma at diagnosis.","authors":"Nicolas Coudray, Michelle C Juarez, Maressa C Criscito, Adalberto Claudio Quiros, Reason Wilken, Stephanie R Jackson Cullison, Mary L Stevenson, Nicole A Doudican, Ke Yuan, Jamie D Aquino, Daniel M Klufas, Jeffrey P North, Siegrid S Yu, Fadi Murad, Emily Ruiz, Chrysalyne D Schmults, Cristian D Cardona Machado, Javier Cañueto, Anirudh Choudhary, Alysia N Hughes, Alyssa Stockard, Zachary Leibovit-Reiben, Aaron R Mangold, Aristotelis Tsirigos, John A Carucci","doi":"10.1038/s41746-025-01496-3","DOIUrl":"10.1038/s41746-025-01496-3","url":null,"abstract":"<p><p>Primary cutaneous squamous cell carcinoma (cSCC) is responsible for ~10,000 deaths annually in the United States. Stratification of risk of poor outcome at initial biopsy would significantly impact clinical decision-making during the initial post operative period where intervention has been shown to be most effective. Using whole-slide images (WSI) from 163 patients from 3 institutions, we developed a self supervised deep-learning model to predict poor outcomes in cSCC patients from histopathological features at initial diagnosis, and validated it using WSI from 563 patients, collected from two other academic institutions. For disease-free survival prediction, the model attained a concordance index of 0.73 in the development cohort and 0.84 in the Mayo cohort. The model's interpretability revealed that features like poor differentiation and deep invasion were strongly associated with poor prognosis. Furthermore, the model is effective in stratifying risk among BWH T2a and AJCC T2, known for outcome heterogeneity.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"8 1","pages":"105"},"PeriodicalIF":12.4,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11830021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41746-025-01438-z
Brandon Theodorou, Benjamin Danek, Venkat Tummala, Shivam Pankaj Kumar, Bradley Malin, Jimeng Sun
Applying machine learning to clinical outcome prediction is challenging due to imbalanced datasets and sensitive tasks that contain rare yet critical outcomes and where equitable treatment across diverse patient groups is essential. Despite attempts, biases in predictions persist, driven by disparities in representation and exacerbated by the scarcity of positive labels, perpetuating health inequities. This paper introduces FairPlay, a synthetic data generation approach leveraging large language models, to address these issues. FairPlay enhances algorithmic performance and reduces bias by creating realistic, anonymous synthetic patient data that improves representation and augments dataset patterns while preserving privacy. Through experiments on multiple datasets, we demonstrate that FairPlay boosts mortality prediction performance across diverse subgroups, achieving up to a 21% improvement in F1 Score without requiring additional data or altering downstream training pipelines. Furthermore, FairPlay consistently reduces subgroup performance gaps, as shown by universal improvements in performance and fairness metrics across four experimental setups.
{"title":"Improving medical machine learning models with generative balancing for equity and excellence","authors":"Brandon Theodorou, Benjamin Danek, Venkat Tummala, Shivam Pankaj Kumar, Bradley Malin, Jimeng Sun","doi":"10.1038/s41746-025-01438-z","DOIUrl":"https://doi.org/10.1038/s41746-025-01438-z","url":null,"abstract":"<p>Applying machine learning to clinical outcome prediction is challenging due to imbalanced datasets and sensitive tasks that contain rare yet critical outcomes and where equitable treatment across diverse patient groups is essential. Despite attempts, biases in predictions persist, driven by disparities in representation and exacerbated by the scarcity of positive labels, perpetuating health inequities. This paper introduces <span>FairPlay</span>, a synthetic data generation approach leveraging large language models, to address these issues. <span>FairPlay</span> enhances algorithmic performance and reduces bias by creating realistic, anonymous synthetic patient data that improves representation and augments dataset patterns while preserving privacy. Through experiments on multiple datasets, we demonstrate that <span>FairPlay</span> boosts mortality prediction performance across diverse subgroups, achieving up to a 21% improvement in F1 Score without requiring additional data or altering downstream training pipelines. Furthermore, <span>FairPlay</span> consistently reduces subgroup performance gaps, as shown by universal improvements in performance and fairness metrics across four experimental setups.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We piloted using Large Language Models (LLMs) for organizing AMIA 2024 Informatics Summit. LLMs were prompt engineered to develop algorithms for reviewer assignments, group presentations into sessions, suggest session titles, and provide one-sentence summaries for presentations. These tools substantially reduced planning time while enhancing the coherence and efficiency of conference organization. Our experience shows the potential of generative AI and LLMs to complement human expertise in academic conference planning.
{"title":"Leveraging large language models for academic conference organization","authors":"Yuan Luo, Yikuan Li, Omolola Ogunyemi, Eileen Koski, Blanca E. Himes","doi":"10.1038/s41746-025-01492-7","DOIUrl":"https://doi.org/10.1038/s41746-025-01492-7","url":null,"abstract":"We piloted using Large Language Models (LLMs) for organizing AMIA 2024 Informatics Summit. LLMs were prompt engineered to develop algorithms for reviewer assignments, group presentations into sessions, suggest session titles, and provide one-sentence summaries for presentations. These tools substantially reduced planning time while enhancing the coherence and efficiency of conference organization. Our experience shows the potential of generative AI and LLMs to complement human expertise in academic conference planning.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"42 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41746-025-01495-4
Tjardo Daniël Maarseveen, Herman Kasper Glas, Josien Veris-van Dieren, Erik van den Akker, Rachel Knevel
Musculoskeletal complaints account for 30% of GP consultations, with many referred to rheumatology clinics via letters. This study developed a Machine Learning (ML) pipeline to prioritize referrals by identifying rheumatoid arthritis (RA), osteoarthritis, fibromyalgia, and patients requiring long-term care. Using 8044 referral letters from 5728 patients across 12 clinics, we trained and validated ML models in two large centers and tested their generalizability in the remaining ten. The models were robust, with RA achieving an AUC-ROC of 0.78 (CI: 0.74-0.83), osteoarthritis 0.71 (CI: 0.67-0.74), fibromyalgia 0.81 (CI: 0.77-0.85), and chronic follow-up 0.63 (CI: 0.61-0.66). The RA-classifier outperformed manual referral systems, as it prioritised RA over non-RA cases (P < 0.001), while the manual referral system could not differentiate between the two. The other classifiers showed similar prioritisation improvements, highlighting the potential to enhance care efficiency, reduce clinician workload, and facilitate earlier specialized care. Future work will focus on building clinical decision-support tools.
{"title":"Improving musculoskeletal care with AI enhanced triage through data driven screening of referral letters.","authors":"Tjardo Daniël Maarseveen, Herman Kasper Glas, Josien Veris-van Dieren, Erik van den Akker, Rachel Knevel","doi":"10.1038/s41746-025-01495-4","DOIUrl":"10.1038/s41746-025-01495-4","url":null,"abstract":"<p><p>Musculoskeletal complaints account for 30% of GP consultations, with many referred to rheumatology clinics via letters. This study developed a Machine Learning (ML) pipeline to prioritize referrals by identifying rheumatoid arthritis (RA), osteoarthritis, fibromyalgia, and patients requiring long-term care. Using 8044 referral letters from 5728 patients across 12 clinics, we trained and validated ML models in two large centers and tested their generalizability in the remaining ten. The models were robust, with RA achieving an AUC-ROC of 0.78 (CI: 0.74-0.83), osteoarthritis 0.71 (CI: 0.67-0.74), fibromyalgia 0.81 (CI: 0.77-0.85), and chronic follow-up 0.63 (CI: 0.61-0.66). The RA-classifier outperformed manual referral systems, as it prioritised RA over non-RA cases (P < 0.001), while the manual referral system could not differentiate between the two. The other classifiers showed similar prioritisation improvements, highlighting the potential to enhance care efficiency, reduce clinician workload, and facilitate earlier specialized care. Future work will focus on building clinical decision-support tools.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"8 1","pages":"98"},"PeriodicalIF":12.4,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143414792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41746-025-01498-1
Peyman Ghasemi, Matthew Greenberg, Danielle A Southern, Bing Li, James A White, Joon Lee
Choosing optimal revascularization strategies for patients with obstructive coronary artery disease (CAD) remains a clinical challenge. While randomized controlled trials offer population-level insights, gaps remain regarding personalized decision-making for individual patients. We applied off-policy reinforcement learning (RL) to a composite data model from 41,328 unique patients with angiography-confirmed obstructive CAD. In an offline setting, we estimated optimal treatment policies and evaluated these policies using weighted importance sampling. Our findings indicate that RL-guided therapy decisions outperformed physician-based decision making, with RL policies achieving up to 32% improvement in expected rewards based on composite major cardiovascular events outcomes. Additionally, we introduced methods to ensure that RL CAD treatment policies remain compatible with locally achievable clinical practice models, presenting an interpretable RL policy with a limited number of states. Overall, this novel RL-based clinical decision support tool, RL4CAD, demonstrates potential to optimize care in patients with obstructive CAD referred for invasive coronary angiography.
{"title":"Personalized decision making for coronary artery disease treatment using offline reinforcement learning.","authors":"Peyman Ghasemi, Matthew Greenberg, Danielle A Southern, Bing Li, James A White, Joon Lee","doi":"10.1038/s41746-025-01498-1","DOIUrl":"10.1038/s41746-025-01498-1","url":null,"abstract":"<p><p>Choosing optimal revascularization strategies for patients with obstructive coronary artery disease (CAD) remains a clinical challenge. While randomized controlled trials offer population-level insights, gaps remain regarding personalized decision-making for individual patients. We applied off-policy reinforcement learning (RL) to a composite data model from 41,328 unique patients with angiography-confirmed obstructive CAD. In an offline setting, we estimated optimal treatment policies and evaluated these policies using weighted importance sampling. Our findings indicate that RL-guided therapy decisions outperformed physician-based decision making, with RL policies achieving up to 32% improvement in expected rewards based on composite major cardiovascular events outcomes. Additionally, we introduced methods to ensure that RL CAD treatment policies remain compatible with locally achievable clinical practice models, presenting an interpretable RL policy with a limited number of states. Overall, this novel RL-based clinical decision support tool, RL4CAD, demonstrates potential to optimize care in patients with obstructive CAD referred for invasive coronary angiography.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"8 1","pages":"99"},"PeriodicalIF":12.4,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143414850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41746-024-01379-z
Andrew Yiu, Kapil Sahnan
The ongoing U.S v. Apple lawsuit demonstrates the potential for an ‘ecosystem’ to monopolize hardware, software, and/or services. This raises important issues for the surgical community with the growing adoption of digital surgery solutions, such as surgical robots, that offer industry unprecedented access to, and control over, surgical data. Surgeons must understand the significance of this data and ensure patient benefit is central to the ongoing digital transformation of surgery.
{"title":"Ecosystems and monopolies in digital surgery","authors":"Andrew Yiu, Kapil Sahnan","doi":"10.1038/s41746-024-01379-z","DOIUrl":"https://doi.org/10.1038/s41746-024-01379-z","url":null,"abstract":"The ongoing U.S v. Apple lawsuit demonstrates the potential for an ‘ecosystem’ to monopolize hardware, software, and/or services. This raises important issues for the surgical community with the growing adoption of digital surgery solutions, such as surgical robots, that offer industry unprecedented access to, and control over, surgical data. Surgeons must understand the significance of this data and ensure patient benefit is central to the ongoing digital transformation of surgery.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"10 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143393104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41746-025-01488-3
Su Hwan Kim, Severin Schramm, Lisa C. Adams, Rickmer Braren, Keno K. Bressem, Matthias Keicher, Paul-Sören Platzek, Karolin Johanna Paprottka, Claus Zimmer, Dennis M. Hedderich, Benedikt Wiestler
Recent advancements in large language models (LLMs) have created new ways to support radiological diagnostics. While both open-source and proprietary LLMs can address privacy concerns through local or cloud deployment, open-source models provide advantages in continuity of access, and potentially lower costs. This study evaluated the diagnostic performance of fifteen open-source LLMs and one closed-source LLM (GPT-4o) in 1,933 cases from the Eurorad library. LLMs provided differential diagnoses based on clinical history and imaging findings. Responses were considered correct if the true diagnosis appeared in the top three suggestions. Models were further tested on 60 non-public brain MRI cases from a tertiary hospital to assess generalizability. In both datasets, GPT-4o demonstrated superior performance, closely followed by Llama-3-70B, revealing how open-source LLMs are rapidly closing the gap to proprietary models. Our findings highlight the potential of open-source LLMs as decision support tools for radiological differential diagnosis in challenging, real-world cases.
{"title":"Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports","authors":"Su Hwan Kim, Severin Schramm, Lisa C. Adams, Rickmer Braren, Keno K. Bressem, Matthias Keicher, Paul-Sören Platzek, Karolin Johanna Paprottka, Claus Zimmer, Dennis M. Hedderich, Benedikt Wiestler","doi":"10.1038/s41746-025-01488-3","DOIUrl":"https://doi.org/10.1038/s41746-025-01488-3","url":null,"abstract":"<p>Recent advancements in large language models (LLMs) have created new ways to support radiological diagnostics. While both open-source and proprietary LLMs can address privacy concerns through local or cloud deployment, open-source models provide advantages in continuity of access, and potentially lower costs. This study evaluated the diagnostic performance of fifteen open-source LLMs and one closed-source LLM (GPT-4o) in 1,933 cases from the Eurorad library. LLMs provided differential diagnoses based on clinical history and imaging findings. Responses were considered correct if the true diagnosis appeared in the top three suggestions. Models were further tested on 60 non-public brain MRI cases from a tertiary hospital to assess generalizability. In both datasets, GPT-4o demonstrated superior performance, closely followed by Llama-3-70B, revealing how open-source LLMs are rapidly closing the gap to proprietary models. Our findings highlight the potential of open-source LLMs as decision support tools for radiological differential diagnosis in challenging, real-world cases.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"15 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143393105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11DOI: 10.1038/s41746-025-01481-w
Jonas Bienzeisler, Alexander Kombeiz, Saskia Ehrentreich, Ronny Otto, Wiebke Schirrmeister, Marco Pegoraro, Dominik Brammen, Behrus Puladi, Rainer Röhrig, Raphael W Majeed
Continuous access to electronic health records will fuel the digital transformation of medicine. For data-sharing initiatives, the challenge lies in ensuring data access aligns with the interests of data holders. Federated data access authorization, where data remains controlled locally, may offer a solution to balance these interests. This paper reports on a digital health implementation of the federated data access authorization system used in the German National Emergency Department Data Registry. Using data from 2017 to 2024, we analyzed the system’s effectiveness in managing data access in a nationwide research network of 58 emergency departments. Facilitating access to more than 7.9 million records, 75% of data access queries were authorized within 15 days. The system also supports periodic queries, enabling recurring real-time access. Query volumes grew from 15 to over 23,000 by 2024, with completion rates of 86%. The system may thus serve as a blueprint for data-sharing initiatives worldwide.
{"title":"Implementation report on pioneering federated data access for the German National Emergency Department Data Registry","authors":"Jonas Bienzeisler, Alexander Kombeiz, Saskia Ehrentreich, Ronny Otto, Wiebke Schirrmeister, Marco Pegoraro, Dominik Brammen, Behrus Puladi, Rainer Röhrig, Raphael W Majeed","doi":"10.1038/s41746-025-01481-w","DOIUrl":"https://doi.org/10.1038/s41746-025-01481-w","url":null,"abstract":"<p>Continuous access to electronic health records will fuel the digital transformation of medicine. For data-sharing initiatives, the challenge lies in ensuring data access aligns with the interests of data holders. Federated data access authorization, where data remains controlled locally, may offer a solution to balance these interests. This paper reports on a digital health implementation of the federated data access authorization system used in the German National Emergency Department Data Registry. Using data from 2017 to 2024, we analyzed the system’s effectiveness in managing data access in a nationwide research network of 58 emergency departments. Facilitating access to more than 7.9 million records, 75% of data access queries were authorized within 15 days. The system also supports periodic queries, enabling recurring real-time access. Query volumes grew from 15 to over 23,000 by 2024, with completion rates of 86%. The system may thus serve as a blueprint for data-sharing initiatives worldwide.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"78 4 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143393106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}