Pub Date : 2025-12-16eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001120
Nguyen Quang Huy, Thanh-Ha Do, Nguyen Van De, Hoang Kim Giap, Vu Huyen Tram
This paper introduces a novel, expert-annotated single-cell image dataset for thyroid cancer diagnosis, comprising 3,419 individual cell images extracted from high-resolution histopathological slides and annotated with nine clinically significant nuclear features. The dataset, collected and annotated in collaboration with pathologists at the 108 Military Central Hospital (Vietnam), presents a significant resource for advancing research in automated cytological analysis. We establish a series of robust deep-learning baseline pipelines for multi-label classification on this dataset. These baselines incorporate ConvNeXt, Vision Transformers (ViT), and ResNet backbones, along with techniques to address class imbalance, including conditional CutMix, weighted sampling, and SPA loss with Label Pairwise Regularization (LPR). Experiments evaluate the good performance of the proposed pipelines, demonstrating the challenges over the dataset's characteristics and providing a benchmark for future studies in interpretable and reliable AI-based cytological diagnosis. The results highlight the importance of effective model architectures and data-centric strategies for accurate multi-label classification of single-cell images.
{"title":"A novel expert-annotated single-cell dataset for thyroid cancer diagnosis with deep learning benchmarks.","authors":"Nguyen Quang Huy, Thanh-Ha Do, Nguyen Van De, Hoang Kim Giap, Vu Huyen Tram","doi":"10.1371/journal.pdig.0001120","DOIUrl":"10.1371/journal.pdig.0001120","url":null,"abstract":"<p><p>This paper introduces a novel, expert-annotated single-cell image dataset for thyroid cancer diagnosis, comprising 3,419 individual cell images extracted from high-resolution histopathological slides and annotated with nine clinically significant nuclear features. The dataset, collected and annotated in collaboration with pathologists at the 108 Military Central Hospital (Vietnam), presents a significant resource for advancing research in automated cytological analysis. We establish a series of robust deep-learning baseline pipelines for multi-label classification on this dataset. These baselines incorporate ConvNeXt, Vision Transformers (ViT), and ResNet backbones, along with techniques to address class imbalance, including conditional CutMix, weighted sampling, and SPA loss with Label Pairwise Regularization (LPR). Experiments evaluate the good performance of the proposed pipelines, demonstrating the challenges over the dataset's characteristics and providing a benchmark for future studies in interpretable and reliable AI-based cytological diagnosis. The results highlight the importance of effective model architectures and data-centric strategies for accurate multi-label classification of single-cell images.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001120"},"PeriodicalIF":7.7,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001126
Tamara Pscheidl, Carina Benstoem, Kelly Ansems, Lena Saal-Bauernschubert, Anne Ritter, Ana-Mihaela Zorger, Karolina Dahms, Sandra Dohmen, Eva Steinfeld, Julia Dormann, Claire Iannizzi, Nicole Skoetz, Heidrun Janka, Maria-Inti Metzendorf, Carla Nau, Miriam Stegemann, Patrick Meybohm, Falk von Dincklage, Sven Laudi, Falk Fichtner, Stephanie Weibel
Given the growing challenges of healthcare, including an aging population and increasing shortages of specialized intensive care staff, this systematic review investigates the efficacy of telemedicine in intensive care compared to standard of care (SoC) or any other type or mode of telemedicine on patient-relevant outcomes for adult intensive care unit (ICU) patients. This systematic review follows Cochrane's methodological standards. Comprehensive searches for any controlled clinical studies were conducted in MEDLINE, Scopus, CINAHL, and CENTRAL (up to 18 April 2024, and an updated search for randomized controlled trials (RCTs) up to 29 September 2025). Twenty-six studies comparing telemedicine in intensive care to SoC with approximately 2,164,508 analysed patients were identified, including data from one cluster RCT (cRCT), two stepped-wedge cluster RCTs (sw-cRCTs), and 23 non-randomized studies of interventions (NRSIs). No other comparisons were identified. Due to high clinical and methodological heterogeneity among studies, no meta-analysis was conducted. For ICU mortality, one cRCT (15,230 patients) and two sw-cRCTs (5,915 patients) showed heterogeneous results: two found no evidence for a difference, while one favoured SoC (very low-certainty). One sw-cRCT (1,462 patients) reporting overall mortality at 180 days suggested no evidence for a difference between groups (very low-certainty). Data from one cRCT (15,230 patients) and one sw-cRCT (1,462 patients) on ICU length of stay (LOS) showed no evidence for a difference between groups (moderate- and very low-certainty). Quality of life from one sw-cRCT (786 patients) indicated no evidence for a difference (very low-certainty). Six NRSIs reported adjusted data on ICU mortality, two on overall mortality, and three on ICU LOS, with heterogeneous results. High risk of bias and substantial heterogeneity limited the certainty, emphasizing the need for robust, patient-centered research in clinical studies to define telemedicine's role in intensive care and optimize its implementation. Future studies should particularly ensure transparent and comprehensive reporting.
{"title":"Telemedicine in adult intensive care: A systematic review of patient-relevant outcomes and methodological considerations.","authors":"Tamara Pscheidl, Carina Benstoem, Kelly Ansems, Lena Saal-Bauernschubert, Anne Ritter, Ana-Mihaela Zorger, Karolina Dahms, Sandra Dohmen, Eva Steinfeld, Julia Dormann, Claire Iannizzi, Nicole Skoetz, Heidrun Janka, Maria-Inti Metzendorf, Carla Nau, Miriam Stegemann, Patrick Meybohm, Falk von Dincklage, Sven Laudi, Falk Fichtner, Stephanie Weibel","doi":"10.1371/journal.pdig.0001126","DOIUrl":"10.1371/journal.pdig.0001126","url":null,"abstract":"<p><p>Given the growing challenges of healthcare, including an aging population and increasing shortages of specialized intensive care staff, this systematic review investigates the efficacy of telemedicine in intensive care compared to standard of care (SoC) or any other type or mode of telemedicine on patient-relevant outcomes for adult intensive care unit (ICU) patients. This systematic review follows Cochrane's methodological standards. Comprehensive searches for any controlled clinical studies were conducted in MEDLINE, Scopus, CINAHL, and CENTRAL (up to 18 April 2024, and an updated search for randomized controlled trials (RCTs) up to 29 September 2025). Twenty-six studies comparing telemedicine in intensive care to SoC with approximately 2,164,508 analysed patients were identified, including data from one cluster RCT (cRCT), two stepped-wedge cluster RCTs (sw-cRCTs), and 23 non-randomized studies of interventions (NRSIs). No other comparisons were identified. Due to high clinical and methodological heterogeneity among studies, no meta-analysis was conducted. For ICU mortality, one cRCT (15,230 patients) and two sw-cRCTs (5,915 patients) showed heterogeneous results: two found no evidence for a difference, while one favoured SoC (very low-certainty). One sw-cRCT (1,462 patients) reporting overall mortality at 180 days suggested no evidence for a difference between groups (very low-certainty). Data from one cRCT (15,230 patients) and one sw-cRCT (1,462 patients) on ICU length of stay (LOS) showed no evidence for a difference between groups (moderate- and very low-certainty). Quality of life from one sw-cRCT (786 patients) indicated no evidence for a difference (very low-certainty). Six NRSIs reported adjusted data on ICU mortality, two on overall mortality, and three on ICU LOS, with heterogeneous results. High risk of bias and substantial heterogeneity limited the certainty, emphasizing the need for robust, patient-centered research in clinical studies to define telemedicine's role in intensive care and optimize its implementation. Future studies should particularly ensure transparent and comprehensive reporting.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001126"},"PeriodicalIF":7.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12704867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001133
Alexander J Wray, Katelyn R O'Bright, Shiran Zhong, Sean Doherty, Michael Luubert, Jed Long, Catherine E Reining, Christopher J Lemieux, Jon Salter, Jason Gilliland
Smartphones have become a widely used tool for delivering digital health interventions and conducting observational research. Many digital health studies adopt an ecological momentary assessment (EMA) methodology, which can be enhanced by collecting participant location data using built-in smartphone technologies. However, there is currently a lack of customizable software capable of supporting geographically explicit research in EMA. To address this gap, we developed the Healthy Environments and Active Living for Translational Health (HEALTH) Platform. The HEALTH Platform is a customizable smartphone application that enables researchers to deliver geographic ecological momentary assessment (GEMA) prompts on a smartphone in real-time based on spatially complex geofence boundaries, to collect audiovisual data, and to flexibly adjust system logic without requiring time-consuming updates to participants' devices. We illustrate the HEALTH Platform's capabilities through a study of park exposure and well-being. This study illustrates how the HEALTH Platform improves upon existing GEMA software platforms by offering greater customization and real-time flexibility in data collection and prompting participants. We observed survey prompt adherence is associated with participant motivation and the complexity of the survey instrument itself, following past EMA research findings. Overall, the HEALTH Platform offers a flexible solution for implementing GEMA in digital health research and practice.
{"title":"The Healthy Environments and Active Living for Translational Health (HEALTH) Platform: A smartphone-based system for geographic ecological momentary assessment research.","authors":"Alexander J Wray, Katelyn R O'Bright, Shiran Zhong, Sean Doherty, Michael Luubert, Jed Long, Catherine E Reining, Christopher J Lemieux, Jon Salter, Jason Gilliland","doi":"10.1371/journal.pdig.0001133","DOIUrl":"10.1371/journal.pdig.0001133","url":null,"abstract":"<p><p>Smartphones have become a widely used tool for delivering digital health interventions and conducting observational research. Many digital health studies adopt an ecological momentary assessment (EMA) methodology, which can be enhanced by collecting participant location data using built-in smartphone technologies. However, there is currently a lack of customizable software capable of supporting geographically explicit research in EMA. To address this gap, we developed the Healthy Environments and Active Living for Translational Health (HEALTH) Platform. The HEALTH Platform is a customizable smartphone application that enables researchers to deliver geographic ecological momentary assessment (GEMA) prompts on a smartphone in real-time based on spatially complex geofence boundaries, to collect audiovisual data, and to flexibly adjust system logic without requiring time-consuming updates to participants' devices. We illustrate the HEALTH Platform's capabilities through a study of park exposure and well-being. This study illustrates how the HEALTH Platform improves upon existing GEMA software platforms by offering greater customization and real-time flexibility in data collection and prompting participants. We observed survey prompt adherence is associated with participant motivation and the complexity of the survey instrument itself, following past EMA research findings. Overall, the HEALTH Platform offers a flexible solution for implementing GEMA in digital health research and practice.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001133"},"PeriodicalIF":7.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0000782
Raji Tajudeen, Mosoka Papa Fallah, John Ojo, Tamrat Shaweno, Michael Sileshi Mekbib, Frehiwot Mulugeta, Wondwossen Amanuel, Moses Bamatura, Dennis Kibiye, Patrick Chanda Kabwe, Senga Sembuche, Ngashi Ngongo, Nebiyu Dereje, Jean Kaseya
The DHIS2 system enabled real-time tracking of vaccine distribution and administration to facilitate data-driven decisions. Experts from the Africa Centres for Disease Control and Prevention (Africa CDC) Monitoring and Evaluation (M&E) and Management Information System (MIS) teams, with support from the Health Information Systems Program South Africa (HISP-SA), developed the continental COVID-19 vaccination tracking system. Several variables related to COVID-19 vaccination were considered in developing the system. Three-hundred fifty users can access the system at different levels with specific roles and privileges. Four dashboards with high-level summary visualizations were developed for top leadership for decision-making, while pages with detailed programmatic results are available to other users depending on their level of access. Africa CDC staff at different levels with a role-based account can view and interact with the dashboards and make necessary decisions based on the COVID-19 vaccination data from program implementation areas on the continent. The Africa CDC vaccination program dashboard provided essential information for public health officials to monitor the continental COVID-19 vaccination efforts and guide timely decisions. As the impact of COVID-19 is not yet over, the continental tracking of COVID-19 vaccine uptake and dashboard visualizations are used to provide the context of continental COVID-19 vaccination coverage and multiple other metrics that may impact the continental COVID-19 vaccine uptake. The lessons learned during the development and implementation of a continental COVID-19 vaccination tracking and visualization dashboard may be applied across various other public health events of continental and global concern.
{"title":"COVID-19 vaccination data management and visualization systems for improved decision-making: Lessons learnt from Africa CDC Saving Lives and Livelihoods program.","authors":"Raji Tajudeen, Mosoka Papa Fallah, John Ojo, Tamrat Shaweno, Michael Sileshi Mekbib, Frehiwot Mulugeta, Wondwossen Amanuel, Moses Bamatura, Dennis Kibiye, Patrick Chanda Kabwe, Senga Sembuche, Ngashi Ngongo, Nebiyu Dereje, Jean Kaseya","doi":"10.1371/journal.pdig.0000782","DOIUrl":"10.1371/journal.pdig.0000782","url":null,"abstract":"<p><p>The DHIS2 system enabled real-time tracking of vaccine distribution and administration to facilitate data-driven decisions. Experts from the Africa Centres for Disease Control and Prevention (Africa CDC) Monitoring and Evaluation (M&E) and Management Information System (MIS) teams, with support from the Health Information Systems Program South Africa (HISP-SA), developed the continental COVID-19 vaccination tracking system. Several variables related to COVID-19 vaccination were considered in developing the system. Three-hundred fifty users can access the system at different levels with specific roles and privileges. Four dashboards with high-level summary visualizations were developed for top leadership for decision-making, while pages with detailed programmatic results are available to other users depending on their level of access. Africa CDC staff at different levels with a role-based account can view and interact with the dashboards and make necessary decisions based on the COVID-19 vaccination data from program implementation areas on the continent. The Africa CDC vaccination program dashboard provided essential information for public health officials to monitor the continental COVID-19 vaccination efforts and guide timely decisions. As the impact of COVID-19 is not yet over, the continental tracking of COVID-19 vaccine uptake and dashboard visualizations are used to provide the context of continental COVID-19 vaccination coverage and multiple other metrics that may impact the continental COVID-19 vaccine uptake. The lessons learned during the development and implementation of a continental COVID-19 vaccination tracking and visualization dashboard may be applied across various other public health events of continental and global concern.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000782"},"PeriodicalIF":7.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0000955
T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey
Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.
{"title":"Developing and validating an explainable digital mortality prediction tool for extremely preterm infants.","authors":"T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey","doi":"10.1371/journal.pdig.0000955","DOIUrl":"10.1371/journal.pdig.0000955","url":null,"abstract":"<p><p>Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000955"},"PeriodicalIF":7.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04eCollection Date: 2025-12-01DOI: 10.1371/journal.pdig.0001106
Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly
Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.
{"title":"A multi-agent approach to neurological clinical reasoning.","authors":"Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly","doi":"10.1371/journal.pdig.0001106","DOIUrl":"10.1371/journal.pdig.0001106","url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001106"},"PeriodicalIF":7.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01DOI: 10.1371/journal.pdig.0001109
Prajwal Ghimire, Keyoumars Ashkan
The term "artificial" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from "organic" to "artificial" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between "artificial" and "organic" are far less distinct than the nomenclature suggests.
{"title":"From artificial to organic: Rethinking the roots of intelligence for digital health.","authors":"Prajwal Ghimire, Keyoumars Ashkan","doi":"10.1371/journal.pdig.0001109","DOIUrl":"10.1371/journal.pdig.0001109","url":null,"abstract":"<p><p>The term \"artificial\" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from \"organic\" to \"artificial\" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between \"artificial\" and \"organic\" are far less distinct than the nomenclature suggests.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001109"},"PeriodicalIF":7.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0000951
John Tayu Lee, Sheng Hui Hsu, Vincent Cheng-Sheng Li, Kanya Anindya, Meng-Huan Chen, Charlotte Wang, Toby Kai-Bo Shen, Valerie Tzu Ning Liu, Hsiao-Hui Chen, Rifat Atun
Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms-including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks-alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79-0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.
{"title":"Evaluating algorithmic fairness of machine learning models in predicting underweight, overweight, and adiposity across socioeconomic and caste groups in India: evidence from the longitudinal ageing study in India.","authors":"John Tayu Lee, Sheng Hui Hsu, Vincent Cheng-Sheng Li, Kanya Anindya, Meng-Huan Chen, Charlotte Wang, Toby Kai-Bo Shen, Valerie Tzu Ning Liu, Hsiao-Hui Chen, Rifat Atun","doi":"10.1371/journal.pdig.0000951","DOIUrl":"https://doi.org/10.1371/journal.pdig.0000951","url":null,"abstract":"<p><p>Machine learning (ML) models are increasingly applied to predict body mass index (BMI) and related outcomes, yet their fairness across socioeconomic and caste groups remains uncertain, particularly in contexts of structural inequality. Using nationally representative data from more than 55,000 adults aged 45 years and older in the Longitudinal Ageing Study in India (LASI), we evaluated the accuracy and fairness of multiple ML algorithms-including Random Forest, XGBoost, Gradient Boosting, LightGBM, Deep Neural Networks, and Deep Cross Networks-alongside logistic regression for predicting underweight, overweight, and central adiposity. Models were trained on 80% of the data and tested on 20%, with performance assessed using AUROC, accuracy, sensitivity, specificity, and precision. Fairness was evaluated through subgroup analyses across socioeconomic and caste groups and equity-based metrics such as Equalized Odds and Demographic Parity. Feature importance was examined using SHAP values, and bias-mitigation methods were implemented at pre-processing, in-processing, and post-processing stages. Tree-based models, particularly LightGBM and Gradient Boosting, achieved the highest AUROC values (0.79-0.84). Incorporating socioeconomic and health-related variables improved prediction, but fairness gaps persisted: performance declined for scheduled tribes and lower socioeconomic groups. SHAP analyses identified grip strength, gender, and residence as key drivers of prediction differences. Among mitigation strategies, Reject Option Classification and Equalized Odds Post-processing moderately reduced subgroup disparities but sometimes decreased overall performance, whereas other approaches yielded minimal gains. ML models can effectively predict obesity and adiposity risk in India, but addressing bias is essential for equitable application. Continued refinement of fairness-aware ML methods is needed to support inclusive and effective public-health decision-making.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0000951"},"PeriodicalIF":7.7,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12654920/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001075
R Constance Wiener, Bayan J Abuhalimeh
Online/digital health literacy is important for individuals to evaluate the influence of such input in their care and consent for treatment. The purpose of this systematic review is to examine the digital health literacy level among adults in studies that used the eHealth Literacy Scale (eHEALS) as a measure of digital health literacy. The authors searched Google Scholar, PubMed, Scopus, and Web of Science for evidence following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Statement, 2020 (PRISMA). Included were articles in which the researchers evaluated the level of digital health literacy using eHEALS, were peer reviewed, written in English or in which English translation was provided, and were published between 2020-2025. There were 200 articles initially identified in the search, 180 were excluded resulting in a sample of 20 publications. EHEALS scores, with possibilities from 8-40, had a weighted mean of 24.3 (95%CI: 17.1-31.6). The lowest mean score was 12.57; and the highest mean score was 35.1. The highest eHEALS score was from a qualitative interview study. Nine other studies reported overall means ≥ 30. There were three with eHEALS scores below 20. Globally, there is a wide range of reported digital health literacy levels. It is critical that the public gains skill and confidence in digital health literacy for healthcare decisions. The results of this study provide evidence of a large range of digital health literacy.
在线/数字卫生素养对于个人评估此类投入对其护理和治疗同意的影响非常重要。本系统综述的目的是在使用电子健康素养量表(eHEALS)作为数字健康素养衡量标准的研究中检查成人的数字健康素养水平。作者检索了谷歌Scholar、PubMed、Scopus和Web of Science,寻找遵循2020年系统评价和元分析声明首选报告项目(PRISMA)的证据。其中包括研究人员使用eHEALS评估数字健康素养水平的文章,这些文章经过同行评审,用英文撰写或提供英文翻译,并在2020-2025年之间发表。最初在检索中确定了200篇文章,其中180篇被排除在外,最终得到了20篇出版物的样本。EHEALS评分的可能性范围为8-40,加权平均值为24.3 (95%CI: 17.1-31.6)。最低平均评分为12.57分;最高平均得分为35.1分。最高的eHEALS得分来自定性访谈研究。另有9项研究报告总平均值≥30。eHEALS得分低于20分的有3个。在全球范围内,报告的数字卫生素养水平差异很大。至关重要的是,公众在数字卫生素养方面获得技能和信心,从而做出医疗保健决策。这项研究的结果为数字健康素养的广泛普及提供了证据。
{"title":"Evaluating adult digital health literacy, 2020-2025: A systematic review.","authors":"R Constance Wiener, Bayan J Abuhalimeh","doi":"10.1371/journal.pdig.0001075","DOIUrl":"10.1371/journal.pdig.0001075","url":null,"abstract":"<p><p>Online/digital health literacy is important for individuals to evaluate the influence of such input in their care and consent for treatment. The purpose of this systematic review is to examine the digital health literacy level among adults in studies that used the eHealth Literacy Scale (eHEALS) as a measure of digital health literacy. The authors searched Google Scholar, PubMed, Scopus, and Web of Science for evidence following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Statement, 2020 (PRISMA). Included were articles in which the researchers evaluated the level of digital health literacy using eHEALS, were peer reviewed, written in English or in which English translation was provided, and were published between 2020-2025. There were 200 articles initially identified in the search, 180 were excluded resulting in a sample of 20 publications. EHEALS scores, with possibilities from 8-40, had a weighted mean of 24.3 (95%CI: 17.1-31.6). The lowest mean score was 12.57; and the highest mean score was 35.1. The highest eHEALS score was from a qualitative interview study. Nine other studies reported overall means ≥ 30. There were three with eHEALS scores below 20. Globally, there is a wide range of reported digital health literacy levels. It is critical that the public gains skill and confidence in digital health literacy for healthcare decisions. The results of this study provide evidence of a large range of digital health literacy.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001075"},"PeriodicalIF":7.7,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145598222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21eCollection Date: 2025-11-01DOI: 10.1371/journal.pdig.0001107
Shahmir H Ali, Hein Thu
{"title":"App fatigue in mHealth: Beyond improving apps, advance equity by meeting people where they are.","authors":"Shahmir H Ali, Hein Thu","doi":"10.1371/journal.pdig.0001107","DOIUrl":"10.1371/journal.pdig.0001107","url":null,"abstract":"","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 11","pages":"e0001107"},"PeriodicalIF":7.7,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12637926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}