Pub Date : 2025-12-19eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001141
Maxime Griot, Jean Vanderdonckt, Demet Yuksel
Electronic Health Records (EHRs) have improved access to patient information but substantially increased clinicians' documentation workload. Large Language Models (LLMs) offer a potential means to reduce this burden, yet real-world deployments in live hospital systems remain limited. We implemented a secure, GDPR-compliant, on-premises LLM assistant integrated into the Epic EHR at a European university hospital. The system uses Qwen3-235B with Retrieval Augmented Generation to deliver context-aware answers drawing on structured patient data, internal and regional clinical documents, and medical literature. A one-month pilot with 28 physicians across nine specialties demonstrated high engagement, with 64% of participants using the assistant daily and generating 482 multi-turn conversations. The most common tasks were summarization, information retrieval, and note drafting, which together accounted for over 70% of interactions. Following the pilot, the system was deployed hospital-wide and adopted by 1,028 users who generated 14,910 conversations over five months, with more than half of clinicians using it at least weekly. Usage remained concentrated on information access and documentation support, indicating stable incorporation into everyday clinical workflows. Feedback volume decreased compared with the pilot, suggesting that routine use diminishes voluntary reporting and underscoring the need for complementary automated monitoring strategies. These findings demonstrate that large-scale integration of LLMs into clinical environments is technically feasible and can achieve sustained use when embedded directly within EHR workflows and governed by strong privacy safeguards. The observed patterns of engagement show that such systems can deliver consistent value in information retrieval and documentation, providing a replicable model for responsible clinical AI deployment.
{"title":"Implementation of large language models in electronic health records.","authors":"Maxime Griot, Jean Vanderdonckt, Demet Yuksel","doi":"10.1371/journal.pdig.0001141","DOIUrl":"10.1371/journal.pdig.0001141","url":null,"abstract":"<p><p>Electronic Health Records (EHRs) have improved access to patient information but substantially increased clinicians' documentation workload. Large Language Models (LLMs) offer a potential means to reduce this burden, yet real-world deployments in live hospital systems remain limited. We implemented a secure, GDPR-compliant, on-premises LLM assistant integrated into the Epic EHR at a European university hospital. The system uses Qwen3-235B with Retrieval Augmented Generation to deliver context-aware answers drawing on structured patient data, internal and regional clinical documents, and medical literature. A one-month pilot with 28 physicians across nine specialties demonstrated high engagement, with 64% of participants using the assistant daily and generating 482 multi-turn conversations. The most common tasks were summarization, information retrieval, and note drafting, which together accounted for over 70% of interactions. Following the pilot, the system was deployed hospital-wide and adopted by 1,028 users who generated 14,910 conversations over five months, with more than half of clinicians using it at least weekly. Usage remained concentrated on information access and documentation support, indicating stable incorporation into everyday clinical workflows. Feedback volume decreased compared with the pilot, suggesting that routine use diminishes voluntary reporting and underscoring the need for complementary automated monitoring strategies. These findings demonstrate that large-scale integration of LLMs into clinical environments is technically feasible and can achieve sustained use when embedded directly within EHR workflows and governed by strong privacy safeguards. The observed patterns of engagement show that such systems can deliver consistent value in information retrieval and documentation, providing a replicable model for responsible clinical AI deployment.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001141"},"PeriodicalIF":7.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001125
Simeng Ma, Zhaowen Nie, Mengyuan Zhang, Junhua Mei, Enqi Zhou, Zhiyi Hu, Honggang Lv, Qian Gong, Gaohua Wang, Huiling Wang, Bo Du, Jun Yang, Zhongchun Liu
Depression is clinically and biologically heterogeneous, mandating classification strategies for personalized medicine. This study explored depression subtypes using metabolomics data from the UK Biobank and validated the subtypes in the Whitehall II cohort. The five-step analysis included: (1) identification of distinct subtypes using non-negative matrix factorization (NMF) and four machine learning algorithms; (2) genome-wide association studies (GWAS) to examine associations across subtypes and controls; (3) comparison of clinical characteristics across subtypes; (4) development of 24 subtype-specific diagnostic models and validation in an independent cohort; and (5) construction and comparison of metabolic networks across subtypes. Cluster analysis of 249 metabolomic indicators in individuals with current depressive episodes (n = 7,945) identified three metabolic subtypes of depression. Subtype 1 was characterized by fatty acid dysregulation, subtype 3 had a hyperlipidemia phenotype, while subtype 2 displayed an intermediate phenotype. Metabolic subtypes were not associated with SNPs. Diagnostic models built using the 249 metabolic indicators yielded the area under the curve (AUC) of 0.644 for the total depression sample and 0.785, 0.817, and 0.942 for subtypes 1, 2, and 3, respectively. Twenty-three additional diagnostic models based on combinations of metabolic indicators improved performance by 12.8-39.6% over a binary classification model. Metabolic networks significantly differed between each subtype and healthy controls but not between the total depressed group and controls. This study defines distinct metabolic subtypes of depression. Future research should combine high-throughput metabolomics with prospectively established depression cohorts and tailored interventions to explore subtype-specific diagnostic and therapeutic biomarkers.
{"title":"Towards precision psychiatry: Metabolomics identifies three biological subtypes of depression.","authors":"Simeng Ma, Zhaowen Nie, Mengyuan Zhang, Junhua Mei, Enqi Zhou, Zhiyi Hu, Honggang Lv, Qian Gong, Gaohua Wang, Huiling Wang, Bo Du, Jun Yang, Zhongchun Liu","doi":"10.1371/journal.pdig.0001125","DOIUrl":"10.1371/journal.pdig.0001125","url":null,"abstract":"<p><p>Depression is clinically and biologically heterogeneous, mandating classification strategies for personalized medicine. This study explored depression subtypes using metabolomics data from the UK Biobank and validated the subtypes in the Whitehall II cohort. The five-step analysis included: (1) identification of distinct subtypes using non-negative matrix factorization (NMF) and four machine learning algorithms; (2) genome-wide association studies (GWAS) to examine associations across subtypes and controls; (3) comparison of clinical characteristics across subtypes; (4) development of 24 subtype-specific diagnostic models and validation in an independent cohort; and (5) construction and comparison of metabolic networks across subtypes. Cluster analysis of 249 metabolomic indicators in individuals with current depressive episodes (n = 7,945) identified three metabolic subtypes of depression. Subtype 1 was characterized by fatty acid dysregulation, subtype 3 had a hyperlipidemia phenotype, while subtype 2 displayed an intermediate phenotype. Metabolic subtypes were not associated with SNPs. Diagnostic models built using the 249 metabolic indicators yielded the area under the curve (AUC) of 0.644 for the total depression sample and 0.785, 0.817, and 0.942 for subtypes 1, 2, and 3, respectively. Twenty-three additional diagnostic models based on combinations of metabolic indicators improved performance by 12.8-39.6% over a binary classification model. Metabolic networks significantly differed between each subtype and healthy controls but not between the total depressed group and controls. This study defines distinct metabolic subtypes of depression. Future research should combine high-throughput metabolomics with prospectively established depression cohorts and tailored interventions to explore subtype-specific diagnostic and therapeutic biomarkers.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001125"},"PeriodicalIF":7.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716697/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001156
Hanu Chaudhari, Christopher Meaney, Kulamakan Kulasegaram, Fok-Han Leung
Creating high-quality medical examinations is challenging due to time, cost, and training requirements. This study evaluates the use of ChatGPT 4.0 (ChatGPT-4) in generating medical exam questions for postgraduate family medicine (FM) trainees. Develop a standardized method for postgraduate multiple-choice medical exam question creation using ChatGPT-4 and compare the effectiveness of large language model (LLM) generated questions to those created by human experts. Eight academic FM physicians rated multiple-choice questions (MCQs) generated by humans and ChatGPT-4 across four categories: 1) human-generated, 2) ChatGPT-4 cloned, 3) ChatGPT-4 novel, and 4) ChatGPT-4 generated questions edited by a human expert. Raters scored each question on 17 quality domains. Quality scores were compared using linear mixed effect models. ChatGPT-4 and human-generated questions were rated as high quality, addressing higher-order thinking. Human-generated questions were less likely to be perceived as artificial intelligence (AI) generated, compared to ChatGPT-4 generated questions. For several quality domains ChatGPT-4 was non-inferior (at a 10% margin), but not superior, to human-generated questions. ChatGPT-4 can create medical exam questions that are high quality, and with respect to certain quality domains, non-inferior to those developed by human experts. LLMs can assist in generating and appraising educational content, leading to potential cost and time savings.
{"title":"Evaluating ChatGPT-4 in the development of family medicine residency examinations.","authors":"Hanu Chaudhari, Christopher Meaney, Kulamakan Kulasegaram, Fok-Han Leung","doi":"10.1371/journal.pdig.0001156","DOIUrl":"10.1371/journal.pdig.0001156","url":null,"abstract":"<p><p>Creating high-quality medical examinations is challenging due to time, cost, and training requirements. This study evaluates the use of ChatGPT 4.0 (ChatGPT-4) in generating medical exam questions for postgraduate family medicine (FM) trainees. Develop a standardized method for postgraduate multiple-choice medical exam question creation using ChatGPT-4 and compare the effectiveness of large language model (LLM) generated questions to those created by human experts. Eight academic FM physicians rated multiple-choice questions (MCQs) generated by humans and ChatGPT-4 across four categories: 1) human-generated, 2) ChatGPT-4 cloned, 3) ChatGPT-4 novel, and 4) ChatGPT-4 generated questions edited by a human expert. Raters scored each question on 17 quality domains. Quality scores were compared using linear mixed effect models. ChatGPT-4 and human-generated questions were rated as high quality, addressing higher-order thinking. Human-generated questions were less likely to be perceived as artificial intelligence (AI) generated, compared to ChatGPT-4 generated questions. For several quality domains ChatGPT-4 was non-inferior (at a 10% margin), but not superior, to human-generated questions. ChatGPT-4 can create medical exam questions that are high quality, and with respect to certain quality domains, non-inferior to those developed by human experts. LLMs can assist in generating and appraising educational content, leading to potential cost and time savings.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001156"},"PeriodicalIF":7.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001120
Nguyen Quang Huy, Thanh-Ha Do, Nguyen Van De, Hoang Kim Giap, Vu Huyen Tram
This paper introduces a novel, expert-annotated single-cell image dataset for thyroid cancer diagnosis, comprising 3,419 individual cell images extracted from high-resolution histopathological slides and annotated with nine clinically significant nuclear features. The dataset, collected and annotated in collaboration with pathologists at the 108 Military Central Hospital (Vietnam), presents a significant resource for advancing research in automated cytological analysis. We establish a series of robust deep-learning baseline pipelines for multi-label classification on this dataset. These baselines incorporate ConvNeXt, Vision Transformers (ViT), and ResNet backbones, along with techniques to address class imbalance, including conditional CutMix, weighted sampling, and SPA loss with Label Pairwise Regularization (LPR). Experiments evaluate the good performance of the proposed pipelines, demonstrating the challenges over the dataset's characteristics and providing a benchmark for future studies in interpretable and reliable AI-based cytological diagnosis. The results highlight the importance of effective model architectures and data-centric strategies for accurate multi-label classification of single-cell images.
{"title":"A novel expert-annotated single-cell dataset for thyroid cancer diagnosis with deep learning benchmarks.","authors":"Nguyen Quang Huy, Thanh-Ha Do, Nguyen Van De, Hoang Kim Giap, Vu Huyen Tram","doi":"10.1371/journal.pdig.0001120","DOIUrl":"10.1371/journal.pdig.0001120","url":null,"abstract":"<p><p>This paper introduces a novel, expert-annotated single-cell image dataset for thyroid cancer diagnosis, comprising 3,419 individual cell images extracted from high-resolution histopathological slides and annotated with nine clinically significant nuclear features. The dataset, collected and annotated in collaboration with pathologists at the 108 Military Central Hospital (Vietnam), presents a significant resource for advancing research in automated cytological analysis. We establish a series of robust deep-learning baseline pipelines for multi-label classification on this dataset. These baselines incorporate ConvNeXt, Vision Transformers (ViT), and ResNet backbones, along with techniques to address class imbalance, including conditional CutMix, weighted sampling, and SPA loss with Label Pairwise Regularization (LPR). Experiments evaluate the good performance of the proposed pipelines, demonstrating the challenges over the dataset's characteristics and providing a benchmark for future studies in interpretable and reliable AI-based cytological diagnosis. The results highlight the importance of effective model architectures and data-centric strategies for accurate multi-label classification of single-cell images.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001120"},"PeriodicalIF":7.7,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001126
Tamara Pscheidl, Carina Benstoem, Kelly Ansems, Lena Saal-Bauernschubert, Anne Ritter, Ana-Mihaela Zorger, Karolina Dahms, Sandra Dohmen, Eva Steinfeld, Julia Dormann, Claire Iannizzi, Nicole Skoetz, Heidrun Janka, Maria-Inti Metzendorf, Carla Nau, Miriam Stegemann, Patrick Meybohm, Falk von Dincklage, Sven Laudi, Falk Fichtner, Stephanie Weibel
Given the growing challenges of healthcare, including an aging population and increasing shortages of specialized intensive care staff, this systematic review investigates the efficacy of telemedicine in intensive care compared to standard of care (SoC) or any other type or mode of telemedicine on patient-relevant outcomes for adult intensive care unit (ICU) patients. This systematic review follows Cochrane's methodological standards. Comprehensive searches for any controlled clinical studies were conducted in MEDLINE, Scopus, CINAHL, and CENTRAL (up to 18 April 2024, and an updated search for randomized controlled trials (RCTs) up to 29 September 2025). Twenty-six studies comparing telemedicine in intensive care to SoC with approximately 2,164,508 analysed patients were identified, including data from one cluster RCT (cRCT), two stepped-wedge cluster RCTs (sw-cRCTs), and 23 non-randomized studies of interventions (NRSIs). No other comparisons were identified. Due to high clinical and methodological heterogeneity among studies, no meta-analysis was conducted. For ICU mortality, one cRCT (15,230 patients) and two sw-cRCTs (5,915 patients) showed heterogeneous results: two found no evidence for a difference, while one favoured SoC (very low-certainty). One sw-cRCT (1,462 patients) reporting overall mortality at 180 days suggested no evidence for a difference between groups (very low-certainty). Data from one cRCT (15,230 patients) and one sw-cRCT (1,462 patients) on ICU length of stay (LOS) showed no evidence for a difference between groups (moderate- and very low-certainty). Quality of life from one sw-cRCT (786 patients) indicated no evidence for a difference (very low-certainty). Six NRSIs reported adjusted data on ICU mortality, two on overall mortality, and three on ICU LOS, with heterogeneous results. High risk of bias and substantial heterogeneity limited the certainty, emphasizing the need for robust, patient-centered research in clinical studies to define telemedicine's role in intensive care and optimize its implementation. Future studies should particularly ensure transparent and comprehensive reporting.
{"title":"Telemedicine in adult intensive care: A systematic review of patient-relevant outcomes and methodological considerations.","authors":"Tamara Pscheidl, Carina Benstoem, Kelly Ansems, Lena Saal-Bauernschubert, Anne Ritter, Ana-Mihaela Zorger, Karolina Dahms, Sandra Dohmen, Eva Steinfeld, Julia Dormann, Claire Iannizzi, Nicole Skoetz, Heidrun Janka, Maria-Inti Metzendorf, Carla Nau, Miriam Stegemann, Patrick Meybohm, Falk von Dincklage, Sven Laudi, Falk Fichtner, Stephanie Weibel","doi":"10.1371/journal.pdig.0001126","DOIUrl":"10.1371/journal.pdig.0001126","url":null,"abstract":"<p><p>Given the growing challenges of healthcare, including an aging population and increasing shortages of specialized intensive care staff, this systematic review investigates the efficacy of telemedicine in intensive care compared to standard of care (SoC) or any other type or mode of telemedicine on patient-relevant outcomes for adult intensive care unit (ICU) patients. This systematic review follows Cochrane's methodological standards. Comprehensive searches for any controlled clinical studies were conducted in MEDLINE, Scopus, CINAHL, and CENTRAL (up to 18 April 2024, and an updated search for randomized controlled trials (RCTs) up to 29 September 2025). Twenty-six studies comparing telemedicine in intensive care to SoC with approximately 2,164,508 analysed patients were identified, including data from one cluster RCT (cRCT), two stepped-wedge cluster RCTs (sw-cRCTs), and 23 non-randomized studies of interventions (NRSIs). No other comparisons were identified. Due to high clinical and methodological heterogeneity among studies, no meta-analysis was conducted. For ICU mortality, one cRCT (15,230 patients) and two sw-cRCTs (5,915 patients) showed heterogeneous results: two found no evidence for a difference, while one favoured SoC (very low-certainty). One sw-cRCT (1,462 patients) reporting overall mortality at 180 days suggested no evidence for a difference between groups (very low-certainty). Data from one cRCT (15,230 patients) and one sw-cRCT (1,462 patients) on ICU length of stay (LOS) showed no evidence for a difference between groups (moderate- and very low-certainty). Quality of life from one sw-cRCT (786 patients) indicated no evidence for a difference (very low-certainty). Six NRSIs reported adjusted data on ICU mortality, two on overall mortality, and three on ICU LOS, with heterogeneous results. High risk of bias and substantial heterogeneity limited the certainty, emphasizing the need for robust, patient-centered research in clinical studies to define telemedicine's role in intensive care and optimize its implementation. Future studies should particularly ensure transparent and comprehensive reporting.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001126"},"PeriodicalIF":7.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12704867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001133
Alexander J Wray, Katelyn R O'Bright, Shiran Zhong, Sean Doherty, Michael Luubert, Jed Long, Catherine E Reining, Christopher J Lemieux, Jon Salter, Jason Gilliland
Smartphones have become a widely used tool for delivering digital health interventions and conducting observational research. Many digital health studies adopt an ecological momentary assessment (EMA) methodology, which can be enhanced by collecting participant location data using built-in smartphone technologies. However, there is currently a lack of customizable software capable of supporting geographically explicit research in EMA. To address this gap, we developed the Healthy Environments and Active Living for Translational Health (HEALTH) Platform. The HEALTH Platform is a customizable smartphone application that enables researchers to deliver geographic ecological momentary assessment (GEMA) prompts on a smartphone in real-time based on spatially complex geofence boundaries, to collect audiovisual data, and to flexibly adjust system logic without requiring time-consuming updates to participants' devices. We illustrate the HEALTH Platform's capabilities through a study of park exposure and well-being. This study illustrates how the HEALTH Platform improves upon existing GEMA software platforms by offering greater customization and real-time flexibility in data collection and prompting participants. We observed survey prompt adherence is associated with participant motivation and the complexity of the survey instrument itself, following past EMA research findings. Overall, the HEALTH Platform offers a flexible solution for implementing GEMA in digital health research and practice.
{"title":"The Healthy Environments and Active Living for Translational Health (HEALTH) Platform: A smartphone-based system for geographic ecological momentary assessment research.","authors":"Alexander J Wray, Katelyn R O'Bright, Shiran Zhong, Sean Doherty, Michael Luubert, Jed Long, Catherine E Reining, Christopher J Lemieux, Jon Salter, Jason Gilliland","doi":"10.1371/journal.pdig.0001133","DOIUrl":"10.1371/journal.pdig.0001133","url":null,"abstract":"<p><p>Smartphones have become a widely used tool for delivering digital health interventions and conducting observational research. Many digital health studies adopt an ecological momentary assessment (EMA) methodology, which can be enhanced by collecting participant location data using built-in smartphone technologies. However, there is currently a lack of customizable software capable of supporting geographically explicit research in EMA. To address this gap, we developed the Healthy Environments and Active Living for Translational Health (HEALTH) Platform. The HEALTH Platform is a customizable smartphone application that enables researchers to deliver geographic ecological momentary assessment (GEMA) prompts on a smartphone in real-time based on spatially complex geofence boundaries, to collect audiovisual data, and to flexibly adjust system logic without requiring time-consuming updates to participants' devices. We illustrate the HEALTH Platform's capabilities through a study of park exposure and well-being. This study illustrates how the HEALTH Platform improves upon existing GEMA software platforms by offering greater customization and real-time flexibility in data collection and prompting participants. We observed survey prompt adherence is associated with participant motivation and the complexity of the survey instrument itself, following past EMA research findings. Overall, the HEALTH Platform offers a flexible solution for implementing GEMA in digital health research and practice.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001133"},"PeriodicalIF":7.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0000782
Raji Tajudeen, Mosoka Papa Fallah, John Ojo, Tamrat Shaweno, Michael Sileshi Mekbib, Frehiwot Mulugeta, Wondwossen Amanuel, Moses Bamatura, Dennis Kibiye, Patrick Chanda Kabwe, Senga Sembuche, Ngashi Ngongo, Nebiyu Dereje, Jean Kaseya
The DHIS2 system enabled real-time tracking of vaccine distribution and administration to facilitate data-driven decisions. Experts from the Africa Centres for Disease Control and Prevention (Africa CDC) Monitoring and Evaluation (M&E) and Management Information System (MIS) teams, with support from the Health Information Systems Program South Africa (HISP-SA), developed the continental COVID-19 vaccination tracking system. Several variables related to COVID-19 vaccination were considered in developing the system. Three-hundred fifty users can access the system at different levels with specific roles and privileges. Four dashboards with high-level summary visualizations were developed for top leadership for decision-making, while pages with detailed programmatic results are available to other users depending on their level of access. Africa CDC staff at different levels with a role-based account can view and interact with the dashboards and make necessary decisions based on the COVID-19 vaccination data from program implementation areas on the continent. The Africa CDC vaccination program dashboard provided essential information for public health officials to monitor the continental COVID-19 vaccination efforts and guide timely decisions. As the impact of COVID-19 is not yet over, the continental tracking of COVID-19 vaccine uptake and dashboard visualizations are used to provide the context of continental COVID-19 vaccination coverage and multiple other metrics that may impact the continental COVID-19 vaccine uptake. The lessons learned during the development and implementation of a continental COVID-19 vaccination tracking and visualization dashboard may be applied across various other public health events of continental and global concern.
{"title":"COVID-19 vaccination data management and visualization systems for improved decision-making: Lessons learnt from Africa CDC Saving Lives and Livelihoods program.","authors":"Raji Tajudeen, Mosoka Papa Fallah, John Ojo, Tamrat Shaweno, Michael Sileshi Mekbib, Frehiwot Mulugeta, Wondwossen Amanuel, Moses Bamatura, Dennis Kibiye, Patrick Chanda Kabwe, Senga Sembuche, Ngashi Ngongo, Nebiyu Dereje, Jean Kaseya","doi":"10.1371/journal.pdig.0000782","DOIUrl":"10.1371/journal.pdig.0000782","url":null,"abstract":"<p><p>The DHIS2 system enabled real-time tracking of vaccine distribution and administration to facilitate data-driven decisions. Experts from the Africa Centres for Disease Control and Prevention (Africa CDC) Monitoring and Evaluation (M&E) and Management Information System (MIS) teams, with support from the Health Information Systems Program South Africa (HISP-SA), developed the continental COVID-19 vaccination tracking system. Several variables related to COVID-19 vaccination were considered in developing the system. Three-hundred fifty users can access the system at different levels with specific roles and privileges. Four dashboards with high-level summary visualizations were developed for top leadership for decision-making, while pages with detailed programmatic results are available to other users depending on their level of access. Africa CDC staff at different levels with a role-based account can view and interact with the dashboards and make necessary decisions based on the COVID-19 vaccination data from program implementation areas on the continent. The Africa CDC vaccination program dashboard provided essential information for public health officials to monitor the continental COVID-19 vaccination efforts and guide timely decisions. As the impact of COVID-19 is not yet over, the continental tracking of COVID-19 vaccine uptake and dashboard visualizations are used to provide the context of continental COVID-19 vaccination coverage and multiple other metrics that may impact the continental COVID-19 vaccine uptake. The lessons learned during the development and implementation of a continental COVID-19 vaccination tracking and visualization dashboard may be applied across various other public health events of continental and global concern.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000782"},"PeriodicalIF":7.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0000955
T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey
Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.
{"title":"Developing and validating an explainable digital mortality prediction tool for extremely preterm infants.","authors":"T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey","doi":"10.1371/journal.pdig.0000955","DOIUrl":"10.1371/journal.pdig.0000955","url":null,"abstract":"<p><p>Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000955"},"PeriodicalIF":7.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001106
Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly
Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.
{"title":"A multi-agent approach to neurological clinical reasoning.","authors":"Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly","doi":"10.1371/journal.pdig.0001106","DOIUrl":"10.1371/journal.pdig.0001106","url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001106"},"PeriodicalIF":7.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1371/journal.pdig.0001109
Prajwal Ghimire, Keyoumars Ashkan
The term "artificial" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from "organic" to "artificial" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between "artificial" and "organic" are far less distinct than the nomenclature suggests.
{"title":"From artificial to organic: Rethinking the roots of intelligence for digital health.","authors":"Prajwal Ghimire, Keyoumars Ashkan","doi":"10.1371/journal.pdig.0001109","DOIUrl":"10.1371/journal.pdig.0001109","url":null,"abstract":"<p><p>The term \"artificial\" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from \"organic\" to \"artificial\" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between \"artificial\" and \"organic\" are far less distinct than the nomenclature suggests.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001109"},"PeriodicalIF":7.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}